Closed shifeiwen closed 2 months ago
Hi @shifeiwen great question.
We are aware of this issue and already working on generic quantization recipes to share with our developer community. We intentionally released model in staging to unblock existing developers to enter with pre-computed encodings.
Existing encodings should work with fine-tuned model with reduced accuracy at this moment. You can surely give it a try. We are actively working on developing work-flow that allows developer to bring and quantize their fine-tuned model.
Please stay connected on github and slack community for latest updates once it's released
Is your feature request related to a problem? Please describe. I am trying to migrate other LLM models using AI hub, but I did not find a document explaining how to migrate the model. I found out after testing llama2. It seems that the quantized encoding file of the model is downloaded, not obtained through some code. I understand that if I fine-tune llama2, my model may not work with these quantized files.