tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
430 stars 59 forks source link

Llama 3.2 #13368

Open yieldthought opened 1 week ago

yieldthought commented 1 week ago

Bring up Llama 3.2 model family on Wormhole, T3K and TG

cglagovichTT commented 1 week ago

10/2 update:

What's next:

cglagovichTT commented 1 week ago

Llama3.2-11B-Vision bringup

Text model

Vision model

To run new tests, I need to figure out how to share llama-models changes. You also have to install some new packages.

pip install -r ../llama-models/requirements.txt

LayerNorm

No issues

ImageFeedForward

Has bias, uses GELU as activation. Only two linears.

ImageAttention

Very similar to Attention, but does not generate a cache! It's MHA. Not a great shape, though: ImageAttention: dim=1280, head_dim=80, n_heads=16 Also requires an attention mask, which means we need to support non-causal attention in SDPA. Meta does something strange with qkvo replication which I don't understand https://github.com/meta-llama/llama-models/blob/main/models/llama3/reference_impl/multimodal/model.py#L254