quic / ai-hub-models

The Qualcomm® AI Hub Models are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.
https://aihub.qualcomm.com
BSD 3-Clause "New" or "Revised" License
438 stars 60 forks source link

[Feature Request] export other model document #72

Closed shifeiwen closed 2 months ago

shifeiwen commented 2 months ago

Is your feature request related to a problem? Please describe. I am trying to migrate other LLM models using AI hub, but I did not find a document explaining how to migrate the model. I found out after testing llama2. It seems that the quantized encoding file of the model is downloaded, not obtained through some code. I understand that if I fine-tune llama2, my model may not work with these quantized files.

bhushan23 commented 2 months ago

Hi @shifeiwen great question.

We are aware of this issue and already working on generic quantization recipes to share with our developer community. We intentionally released model in staging to unblock existing developers to enter with pre-computed encodings.

Existing encodings should work with fine-tuned model with reduced accuracy at this moment. You can surely give it a try. We are actively working on developing work-flow that allows developer to bring and quantize their fine-tuned model.

Please stay connected on github and slack community for latest updates once it's released