tenstorrent / tt-forge-fe

The TT-Forge FE is a graph compiler designed to optimize and transform computational graphs for deep learning models, enhancing their performance and efficiency.
https://docs.tenstorrent.com/tt-forge-fe/
Apache License 2.0
19 stars 3 forks source link

[CI] Obtain higher DRAM silicon runner #182

Open nvukobratTT opened 2 months ago

nvukobratTT commented 2 months ago

Right now, loading up Llama 3B uses more then 32 GB in DRAM (closer to 64 GB).

Related issue:

vmilosevic commented 2 months ago

Asking @teijo about this, to see what is possible and if tt-metal had similar problems.

vmilosevic commented 2 months ago

@teijo Metalium uses the same instance flavors (=cpus, ram, disk), so if there is a fundamental memory requirement that doesn’t fit into current flavors, they either have not bumped into it, or they do it on dedicated bare metal servers.

If it looks like the current menory sizes available aren’t working for the models, we’ll need to look into making larger ones.

I think the most likely scenario is that we’ll make a double-sized flavor that has twice the cpu, memory, cards, and disk to accommodate that (double of everything since then we can avoid fragmentation of resources, e.g. server with 1 instance that eats all RAM but leaves 7 cards unusable)

nvukobratTT commented 1 month ago

@vmilosevic I managed to reduce the compile memory requirements for this model. I'm moving this issue into the P2 state for the Llama 3B milestone. If that changes, I'll raise more details on this issue.

Issue with more details: