quic / efficient-transformers

This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transformers library) into inference-ready formats that run efficiently on Qualcomm Cloud AI 100 accelerators.
https://quic.github.io/efficient-transformers/
Other
43 stars 28 forks source link

Any plans on supporting Llama3.2 text and multimodal on Qualcomm AI 100? #152

Open alew3 opened 1 week ago

alew3 commented 1 week ago

Do you plan on supporting Llama3.2 (text/multimodal) on Qualcomm A100? I saw this post (https://www.qualcomm.com/news/onq/2024/09/qualcomm-partners-with-meta-to-support-llama-3-point-2-big-deal-for-on-device-ai) , but it seems to have compatibility only for Snapdragon chips.

alew3 commented 1 week ago

BTW, your README link to models comming soon is broken (https://github.com/quic/efficient-transformers/blob/main/models-coming-soon)

quic-rishinr commented 1 week ago

Hi Aless, The Llama 3.2 1B and 3B models work out of the box in the current repository, provided you use one of the latest product software releases. Could you share details on Qualcomm Cloud AI100 instances and the software SDK you are using?

The changes for the Llama 3.2 text models (11B and 90B) are currently under review. If you would like to run these models, you can cherry-pick the changes #134 onto the mainline and proceed with the validation.

Regarding the Llama 3.2 multimodal model, it is still under evaluation. I will keep you updated on any progress.

Additionally, I will address the issue with the broken README link.