Advanced Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"
Feature request
1 support different kernels in different backend, including gptq/awq/itrex
2 support different bits and group_size for different layers
Feature request 1 support different kernels in different backend, including gptq/awq/itrex 2 support different bits and group_size for different layers