Closed bg51717 closed 7 months ago
Hi @bg51717 Thanks for reporting this. It's been fixed.
However, there are still bugs present when saving the model after quantizing it with GPTQ. My command is
model_name='facebook/opt-350m'
CUDA_VISABLE_DEVICES=0
python -m qllm \
--model=/home/binguo/project/models/${model_name} \
--method=gptq \
--nsamples=64 \
--wbits=4 \
--groupsize=128 \
--save ./${model_name}_gptq4b \
--export_onnx ./onnx_model/${model_name}_gptq4b
and the error stack is
2024-04-15 11:33:10,114 - qllm - INFO - Finished quantization and packing weight, time cost:171.49598741531372
Traceback (most recent call last):
File "/home/binguo/.conda/envs/QLLM/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/binguo/.conda/envs/QLLM/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/binguo/project/QLLM/qllm/__main__.py", line 6, in <module>
main()
File "/home/binguo/project/QLLM/qllm/run.py", line 78, in main
model_quanter.run(args)
File "/home/binguo/project/QLLM/qllm/auto_model_quantization.py", line 220, in run
AutoQuantizedModelForCausalLM.save_pretrained(model, self.tokenizer, args.save,
File "/home/binguo/project/QLLM/qllm/modeling/base.py", line 291, in save_pretrained
model.config.quantization_config = model.quant_config.quant_config
AttributeError: 'GPTQConfig' object has no attribute 'quant_config'
I'm interested in model quantizing and I believe the QLLM is a great project.Thanks for your work!
Hi @bg51717 Appologize for the inconvenience during the quantization process one more time.
I have test it locally, it should work. Could you give another shot on it?Thanks
I have tried the previous commands,and have a new bug.
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from /home/binguo/project/QLLM/onnx_model/facebook/opt-350m_gptq4b/decoder_merged.onnx failed:/onnxruntime_src/onnxruntime/core/graph/model.cc:179 onnxruntime::Model::Model(onnx::ModelProto&&, const onnxruntime::PathString&, const onnxruntime::IOnnxRuntimeOpSchemaRegistryList*, const onnxruntime::logging::Logger&, const onnxruntime::ModelOptions&) Unsupported model IR version: 10, max supported IR version: 9
I also have tried the solution from https://github.com/microsoft/onnxruntime/issues/20252.But I still get the same error. I know this may not be QLLM's bug.I want to know whether you have this bug.And hope to know your environment.Thanks!
pip install onnx==1.15
will fix it.
The command executed successfully, but it seems that the final result did not pass. So, has it failed?
max abs err_prefill: 0.03906 max abs err_decode: 0.01563 correctness check is not passed
Besides,when I try to ues the other command :
model_name='facebook/opt-350m'
CUDA_VISABLE_DEVICES=0
python -m qllm \
--model=/home/binguo/project/models/${model_name} \
--method=awq \
--dataset=pileval \
--nsamples=16 \
--wbits=4 \
--groupsize=128 \
--save ./${model_name}_awq4b \
--export_onnx ./onnx_model/${model_name}_awq4b
it raise a error:
File "/home/binguo/project/QLLM/qllm/quantization/sequential_layes_awq_config.py", line 629, in auto_detect
_sequential_layers
assert model_type in true_sequential_layers_for_model, f"{model_type} is not support"
AssertionError: OPTForCausalLM is not support
but I found "OptForCausalLM" in true_sequential_layers_for_model rather than 'OPTForCausalLM'.
true_sequential_layers_for_model = dict(
AquilaForCausalLM=get_aquila_layers,
BaichuanForCausalLM=get_baichuan_layers,
BloomForCausalLM=get_bloom_layer,
FalconForCausalLM=get_falcon_layers,
GptBigCodeForCausalLM=get_bigcode_layers,
GPTNeoXForCausalLM=get_neox_layers,
GPTJForCausalLM=get_gptj_layers,
LlamaForCausalLM=get_llama_layers,
LlavaForCausalLM =get_llava_layers,
MistralForCausalLM=get_mistral_layers,
MixtralForCausalLM =get_mixtral_layers,
MptForCausalLM =get_mpt_layers,
OptForCausalLM =get_opt_layers,
QwenForCausalLM=get_qwen_layers,
YiForCausalLM=get_yi_layers,
)
And I want to know how to write the function like get_baichuan_layers
to extend functionality.Thanks!
0.03906
is tolerable basicly. It shall produce the same output text with pytorch model.OptForCausalLM
is indeed a typo of OPTForCausalLM
If you want to support a new model, Please read the original AWQ paper for more detail.
Hi,@wejoncy . Thanks to your great work.I'm studying model quantization from this project. I would like to know if this project is currently complete? This is because I noticed there are 'todo' placeholders in the code and some discrepancies between function definitions and their usage.How complete are GPTQ, AWQ, and HQQ?
Yeah, It's almost done in quantization functionality. Some TODOs are for code-clean/refactor. My next plan would be to support other quantization algorithms if avaiable.
when I using
python -m qllm --model=/root/models/baichuan-inc/Baichuan2-7B-Base --method=gptq --nsamples=64 --wbits=4 --groupsize=128 --save /root/models/baichuan-inc/Baichuan2-7B-Base_gptq_4b --export_onnx /root/models/baichuan-inc/Baichuan2-7B-Base_gptq_4b_onnx/
, it raise a error: