How can I use VPTQ to quantize my own models?

microsoft / VPTQ

VPTQ, A Flexible and Extreme low-bit quantization algorithm

MIT License

394 stars 23 forks source link

How can I use VPTQ to quantize my own models? #56

Open IEI-mjx opened 1 week ago

IEI-mjx commented 1 week ago

As far as I could see, the quantization methods was not provided in this project. All examples showed here were how to inference with vptq models, rather than the quantization tutorials. Or I might have misunderstood this project, maybe you can show me an example about how to quantize models with vptq?

YangWang92 commented 1 week ago

Please hold on, I will release the quantization algorithm in the next 1-2 weeks, and you are welcome to test it.

IEI-mjx commented 1 week ago

Thank you for replying！ I will keep focusing on the project!

gtkacz commented 4 days ago

@YangWang92 will this work with BERT models too?

ponponon commented 4 days ago

Please hold on, I will release the quantization algorithm in the next 1-2 weeks, and you are welcome to test it.请稍等，我将在未来1-2周内发布量化算法，欢迎您测试。

Instruct- Is GPTQ similar to VPTQ in qwen/Qwen2-VL-7B-Instruct- GPTQ-INT4? Is this all about model quantification?

Does VPTQ have any unique advantages over GPTQ?

ponponon commented 4 days ago

Please hold on, I will release the quantization algorithm in the next 1-2 weeks, and you are welcome to test it.请稍等，我将在未来1-2周内发布量化算法，欢迎您测试。

Can it be used to quantize ZhipuAI/glm-4v-9b?

https://huggingface.co/THUDM/glm-4v-9b

https://modelscope.cn/models/ZhipuAI/glm-4v-9b

YangWang92 commented 3 days ago

@YangWang92 will this work with BERT models too?

I have not verified whether VPTQ works on encoder models, but from my intuition, quantizing the encoder should be easier. For example, methods similar to GPTQ perform reasonably well on BERT. I will test this while quantizing VLMs‘ encoder as well.

YangWang92 commented 3 days ago

Please hold on, I will release the quantization algorithm in the next 1-2 weeks, and you are welcome to test it.请稍等，我将在未来1-2周内发布量化算法，欢迎您测试。

Instruct- Is GPTQ similar to VPTQ in qwen/Qwen2-VL-7B-Instruct- GPTQ-INT4? Is this all about model quantification?

Does VPTQ have any unique advantages over GPTQ?

VPTQ will achieve better accuracy at <3 bits, and you can check our early results in the tech report. https://github.com/microsoft/VPTQ/blob/main/VPTQ_tech_report.pdf

YangWang92 commented 3 days ago

Please hold on, I will release the quantization algorithm in the next 1-2 weeks, and you are welcome to test it.请稍等，我将在未来1-2周内发布量化算法，欢迎您测试。

Can it be used to quantize ZhipuAI/glm-4v-9b?

https://huggingface.co/THUDM/glm-4v-9b

https://modelscope.cn/models/ZhipuAI/glm-4v-9b

I will still working on VLMs and please wait a few weeks.