quic / ai-hub-models

The Qualcomm® AI Hub Models are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.
https://aihub.qualcomm.com
BSD 3-Clause "New" or "Revised" License
438 stars 60 forks source link

Running Model Locally with Custom Prompts #96

Closed xuandy05 closed 6 days ago

xuandy05 commented 2 weeks ago

Hello, I am trying to run Llama-v3-8B-Chat on my Android phone using NPU. After exporting the model to the optimized Qualcomm format, how can I run it locally on mobile device with my own prompts? Thank you!

bhushan23 commented 2 weeks ago

Hi @xuandy05 we will release new variant of llama3 compatible to run on-device. Please follow https://github.com/quic/ai-hub-models/tree/main/qai_hub_models/models/llama_v2_7b_chat_quantized/gen_ondevice_llama to run llama2 on-device.

llama3 will also use similar workflow to run on-device. NOTE: you can use current llama3 with above workflow but will changes in config file. please stay tuned and we will update once we we have llama3 flow released