intel / auto-round

Advanced Quantization Algorithm for LLMs/VLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"
https://arxiv.org/abs/2309.05516
Apache License 2.0
261 stars 22 forks source link

Serialization in multiple formats #234

Closed benjamin-marie closed 3 months ago

benjamin-marie commented 3 months ago

When I serialize the model, I would like to serialize with all the formats available, e.g., GPTQ, AWQ, and AutoRound. However, it doesn't seem possible. If I first save with GPTQ format and then try AutoRound format, it doesn't work. Is the model discarded from memory once the model is serialized? It seems that I have to rerun the quantization and then serialize it to another format.

wenhuach21 commented 3 months ago

if you are using the interface,set the inplace to false in save_quantized. if you are using the example, we will support it tomorrow

benjamin-marie commented 3 months ago

in_place is what I was looking for. I'm not sure how I didn't see it! Thanks a lot.