Open wenhuach21 opened 2 months ago
Sure! We welcome open-source innovations to be integrated into LlamaFactory. Currently we are using the PEFT library to support QLoRA fine-tuning. Feel free to submit a PR and we will review it soon.
Thank you !
I’m currently learning your code and plan to implement several changes:
Add YAML Configurations: Introduce some new configurations in YAML to support the quantization process. Export Quantized Models:: export quantized model to GPTQ or AWQ formats to leverage the current pipeline,
Potential Limitations Model Support Currently, we only support LLMs (Large Language Models). We do not yet have a unified API for multimodal models. Bits Support: Our focus is primarily on 4-bit precision. For 2-bit precision, GPTQ asymmetric kernels have accuracy issues, and symmetric quantization has notably lower performance.
Potential Issue UI: as the quantization process needs ~20mins for 7B and 3hrs for 70B on cuda, the webui code may need some change, but I am not familiar with this part.
Please let me know if you have any feedback or suggestions.
Hi @wenhuach21 , thanks for your information. It's okay to skip the support for multimodal models for now. Also, since users mostly use 4-bit quantization, it is not necessary to implement the 2-bit quantization. Concerning the Web UI, don't worry about it. We'll take care of the Web UI support so that you can focus on the algorithms.
Best.
Reminder
System Info
None
Reproduction
None
Expected behavior
None
Others
Hi, Thank you for the fantastic work on LLaMA Factory! I’ve noticed that the repository supports both quantized models generated by various algorithms and on-the-fly quantization.
I am curious if LLaMA Factory is open to contributions of quantization algorithms that are not performed on-the-fly. We have open-source AutoRound that serves as a strong alternative to existing methods. We could contribute if it's ok to you.
github
User experience on Finetuning