intel / auto-round

Advanced Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"
https://arxiv.org/abs/2309.05516
Apache License 2.0
245 stars 20 forks source link

Ref GPTQModel for both quant and inference #196

Closed Qubitium closed 1 month ago

Qubitium commented 3 months ago

GPTQModel has fully integrated AutoRound since v0.9.6. This PR add refence to GPTQModel for both quantization step using AutoRound and inference.

wenhuach21 commented 3 months ago

Hi @Qubitium,

Thank you for your great work on GPTQModel.

1 Since you are using a different API, to avoid confusing users, we could add a community section and link to your README instead. What do you think?

2 After the release of auto-round v0.3, we intend to set the auto-round format as the default to support a unified API for CPU, HPU, and CUDA. However, due to certain constraints, we were unable to pack the CUDA kernels (referred to as v2 in your terms) in our package. Therefore, I was wondering if you could consider splitting the kernel part into a separate Git repository, similar to what autoawq did.

wenhuach21 commented 1 month ago

https://github.com/intel/auto-round/pull/266