Open nvukobratTT opened 2 months ago
Asking @teijo about this, to see what is possible and if tt-metal had similar problems.
@teijo Metalium uses the same instance flavors (=cpus, ram, disk), so if there is a fundamental memory requirement that doesn’t fit into current flavors, they either have not bumped into it, or they do it on dedicated bare metal servers.
If it looks like the current menory sizes available aren’t working for the models, we’ll need to look into making larger ones.
I think the most likely scenario is that we’ll make a double-sized flavor that has twice the cpu, memory, cards, and disk to accommodate that (double of everything since then we can avoid fragmentation of resources, e.g. server with 1 instance that eats all RAM but leaves 7 cards unusable)
@vmilosevic I managed to reduce the compile memory requirements for this model. I'm moving this issue into the P2 state for the Llama 3B milestone. If that changes, I'll raise more details on this issue.
Issue with more details:
Right now, loading up Llama 3B uses more then 32 GB in DRAM (closer to 64 GB).
Related issue: