Open Pradeepa99 opened 5 days ago
@Pradeepa99 The release notes mention more support for AWQ format support and it seems it is referring to the usage of ipex.llm.optimize where you can specify the quant_method as 'gptq' or 'awq' for the low_precision_checkpoint argument.
Details here: https://intel.github.io/intel-extension-for-pytorch/cpu/2.5.0+cpu/tutorials/api_doc.html#ipex.llm.optimize
Let us know if this helps put you on the right track.
@alexsin368
Thank you for sharing this.
I have three questions to get clarified further.
I found this testcase example to load the AWQ format to ipex.llm.optimize API. - Did you mean this approach to integrate AWQ support in ipex.llm.optimize ?
I found this example for GPTQ, where they use ipex.quantization.gptq to generate the checkpoint for GPTQ. - Do we have any similar API to generate the checkpoints for AWQ format as well?
Currently, I am following the approach mentioned here from ITREX to generate the quantized model.
File: https://github.com/intel/intel-extension-for-transformers/blob/main/examples/huggingface/pytorch/text-generation/quantization/run_generation_cpu_woq.py - Can we quantize the models in the above method or do we follow any specific approach to quantize the models?
Describe the issue
I am trying to enable AWQ support with IPEX repo in CPU.
IPEX 2.5.0 release states that it has the support for AWQ Quantization.
But we could see only the GPTQ support added in the official repo.
In the below script file, https://github.com/intel/intel-extension-for-pytorch/blob/release/xpu/2.5.10/examples/cpu/llm/inference/utils/run_gptq.py stated that it is deprecated and recommended to use INC.
What is the correct approach that we need to use to enable the support for AWQ with IPEX repo?
Config used: