tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
459 stars 68 forks source link

Llama 3.1 8b #10692

Open yieldthought opened 3 months ago

yieldthought commented 3 months ago

Bring up Llama 3.1 8b on n150

yieldthought commented 3 months ago

Initial implementation is now on main with working prefill+decode for relatively small sequence lengths. @sraizada-tt is updating the attention to use FlashDecode which will unlock longer sequence lengths and improve performance.

yieldthought commented 3 months ago

Also todo:

mtairum commented 2 months ago

di/dt issue fixed here: https://github.com/tenstorrent/tt-metal/issues/11354

Initial batch of CI tests now in main.