How to deploy llama2 on Qualcomm Snapdragon chips through ExecuTorch？

pytorch / executorch

On-device AI across mobile, embedded and edge for PyTorch

https://pytorch.org/executorch/

Other

1.87k stars 307 forks source link

How to deploy llama2 on Qualcomm Snapdragon chips through ExecuTorch？ #1162

Open tensorflowt opened 11 months ago

tensorflowt commented 11 months ago

Excuse me, if I need to deploy llama2 on Qualcomm Snapdragon chip through ExecuTorch and want to use NPU computing power as an inference computing unit, what do I need to do?

The chip specs I'm currently using are SG885G-WF https://www.quectel.com/product/wi-fi-bt-sg885g-wf-smart-module。

iseeyuan commented 11 months ago

Thanks for the request! We are working on this and will get back to you when there's something we can share.

jingcheng88 commented 10 months ago

@iseeyuan Is there a rough schedule or timeline for this? Qualcomm is scheduled to make available Llama 2-based AI implementations on flagship smartphones starting from 2024.

escorciav commented 9 months ago

Will executorch "varied input len" deal with QNN constraints?

Do you mind to share any ideas? :blush:

mergennachin commented 4 months ago

@iseeyuan can you update this issue please?

iseeyuan commented 4 months ago

It's the same as https://github.com/pytorch/executorch/issues/3586. We are WIP on it. From @cccclai :

we only have the enablement for the small stories models right now. We're actively working on enable llama2 and improve performance number.