Closed Wendy-Xiao closed 5 months ago
Same as #81 The right way to fix this is to upgrade onnxruntime to the latest main version or release 1.17.3.
hi, I'm also getting the same error, and updating the package to 1.17.3 doesn't work
I'm trying to convert gemma 2B model to 2bit. Following is my command:
python -m qllm --model google/gemma-2b --quant_method=hqq --groupsize=16 --wbits=2 --save ./google-gemma-2b_hqq_q2
python -m qllm --load ./google-gemma-2b_hqq_q2 --export_onnx=./google-gemma-2b_hqq_q2_onnx --pack_mode=ORT
Can you please share the fix? Also, can you also upload a requirements.txt file with the packages version?
hi, I'm also getting the same error, and updating the package to 1.17.3 doesn't work ... Can you please share the fix? Also, can you also upload a requirements.txt file with the packages version?
Hi @csetanmayjain , the tool can only convert 4bit model to onnx for now. Do you want to use the HQQ++ quantization algorith, QLLM doen't support that yet.
Hi @wejoncy Thanks! Any algorithm works for me, as long as I can quantize the model to 2-bit, export it to ONNX, and run the ONNX model on a CPU.
AFIK, onnx/ORT doesn't support any 2-bit algorithm for now. llama.cpp would be a good candidate.
BTW, There is a 2bit SOTA quantization algorithm and we will make it running in ONNX. VPTQ_arixv.pdf
I see, Thanks :)
Hi there,
I got the same error when I export mistral gptq model to onnx using
python -m qllm --load TheBloke/Mistral-7B-Instruct-v0.2-GPTQ --export_onnx=./mistral-7b-chat-v2-gptq-onnx --pack_mode ORT
as specified in #81 And it shows the following error:Is there any way to solve the problem?
Thank you!