Open yieldthought opened 3 months ago
Initial implementation is now on main with working prefill+decode for relatively small sequence lengths. @sraizada-tt is updating the attention to use FlashDecode which will unlock longer sequence lengths and improve performance.
Also todo:
di/dt issue fixed here: https://github.com/tenstorrent/tt-metal/issues/11354
Initial batch of CI tests now in main.
Bring up Llama 3.1 8b on n150