wejoncy / QLLM

A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ, and export to onnx/onnx-runtime easily.
Apache License 2.0
150 stars 15 forks source link

TypeError:make mixbits quant linear()got an unexpected keyword argument 'device' #112

Closed bg51717 closed 7 months ago

bg51717 commented 7 months ago

when I using python -m qllm --model=/root/models/baichuan-inc/Baichuan2-7B-Base --method=gptq --nsamples=64 --wbits=4 --groupsize=128 --save /root/models/baichuan-inc/Baichuan2-7B-Base_gptq_4b --export_onnx /root/models/baichuan-inc/Baichuan2-7B-Base_gptq_4b_onnx/, it raise a error:


Traceback (most recent call last):
File "<frozen runpy>",line 198,in _run_module_as_main
File "<frozen runpy>",line 88,in _run_code
File "/root/QLLM/q11m/__main__py",line 6,in <module>
   main()
File "/root/QLLM/qllm/run.py",line 78,in main
   model_quanter.run(args)
File "/root/QLLM/q11m/auto_model_quantization.py",line 215,in run
model self.pack model(model,quantizers,args.pack mode)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/QLLM/q11m/auto_model_quantization.py",line 80,in pack_model
   make_mixbits_quant_linear(model,quantizers,quant_config_by_layer,target_layer=target_layer,device="cp
u")
TypeError:make mixbits quant linear()got an unexpected keyword argument 'device'
wejoncy commented 7 months ago

Hi @bg51717 Thanks for reporting this. It's been fixed.

bg51717 commented 7 months ago

However, there are still bugs present when saving the model after quantizing it with GPTQ. My command is

model_name='facebook/opt-350m'
CUDA_VISABLE_DEVICES=0
python -m qllm \
    --model=/home/binguo/project/models/${model_name} \
    --method=gptq \
    --nsamples=64 \
    --wbits=4 \
    --groupsize=128 \
    --save ./${model_name}_gptq4b \
    --export_onnx ./onnx_model/${model_name}_gptq4b

and the error stack is

2024-04-15 11:33:10,114 - qllm - INFO - Finished quantization and packing weight, time cost:171.49598741531372
Traceback (most recent call last):
  File "/home/binguo/.conda/envs/QLLM/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/binguo/.conda/envs/QLLM/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/binguo/project/QLLM/qllm/__main__.py", line 6, in <module>
    main()
  File "/home/binguo/project/QLLM/qllm/run.py", line 78, in main
    model_quanter.run(args)
  File "/home/binguo/project/QLLM/qllm/auto_model_quantization.py", line 220, in run
    AutoQuantizedModelForCausalLM.save_pretrained(model, self.tokenizer, args.save,
  File "/home/binguo/project/QLLM/qllm/modeling/base.py", line 291, in save_pretrained
    model.config.quantization_config = model.quant_config.quant_config
AttributeError: 'GPTQConfig' object has no attribute 'quant_config'

I'm interested in model quantizing and I believe the QLLM is a great project.Thanks for your work!

wejoncy commented 7 months ago

Hi @bg51717 Appologize for the inconvenience during the quantization process one more time.

I have test it locally, it should work. Could you give another shot on it?Thanks

bg51717 commented 7 months ago

I have tried the previous commands,and have a new bug.

onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from /home/binguo/project/QLLM/onnx_model/facebook/opt-350m_gptq4b/decoder_merged.onnx failed:/onnxruntime_src/onnxruntime/core/graph/model.cc:179 onnxruntime::Model::Model(onnx::ModelProto&&, const onnxruntime::PathString&, const onnxruntime::IOnnxRuntimeOpSchemaRegistryList*, const onnxruntime::logging::Logger&, const onnxruntime::ModelOptions&) Unsupported model IR version: 10, max supported IR version: 9

I also have tried the solution from https://github.com/microsoft/onnxruntime/issues/20252.But I still get the same error. I know this may not be QLLM's bug.I want to know whether you have this bug.And hope to know your environment.Thanks!

wejoncy commented 7 months ago

pip install onnx==1.15 will fix it.

bg51717 commented 7 months ago

The command executed successfully, but it seems that the final result did not pass. So, has it failed?

max abs err_prefill: 0.03906 max abs err_decode: 0.01563 correctness check is  not  passed   

Besides,when I try to ues the other command :

model_name='facebook/opt-350m'
CUDA_VISABLE_DEVICES=0
python -m qllm \
    --model=/home/binguo/project/models/${model_name} \
    --method=awq \
    --dataset=pileval \
    --nsamples=16 \
    --wbits=4 \
    --groupsize=128 \
    --save ./${model_name}_awq4b \
    --export_onnx ./onnx_model/${model_name}_awq4b

it raise a error:

File "/home/binguo/project/QLLM/qllm/quantization/sequential_layes_awq_config.py", line 629, in auto_detect
_sequential_layers 
       assert model_type in true_sequential_layers_for_model, f"{model_type} is not support"
AssertionError: OPTForCausalLM is not support 

but I found "OptForCausalLM" in true_sequential_layers_for_model rather than 'OPTForCausalLM'.

true_sequential_layers_for_model = dict(
    AquilaForCausalLM=get_aquila_layers,
    BaichuanForCausalLM=get_baichuan_layers,
    BloomForCausalLM=get_bloom_layer,
    FalconForCausalLM=get_falcon_layers,
    GptBigCodeForCausalLM=get_bigcode_layers,
    GPTNeoXForCausalLM=get_neox_layers,
    GPTJForCausalLM=get_gptj_layers,
    LlamaForCausalLM=get_llama_layers,
    LlavaForCausalLM =get_llava_layers,
    MistralForCausalLM=get_mistral_layers,
    MixtralForCausalLM =get_mixtral_layers,
    MptForCausalLM =get_mpt_layers,
    OptForCausalLM =get_opt_layers,
    QwenForCausalLM=get_qwen_layers,
    YiForCausalLM=get_yi_layers, 
)

And I want to know how to write the function like get_baichuan_layers to extend functionality.Thanks!

wejoncy commented 7 months ago

If you want to support a new model, Please read the original AWQ paper for more detail.

bg51717 commented 7 months ago

Hi,@wejoncy . Thanks to your great work.I'm studying model quantization from this project. I would like to know if this project is currently complete? This is because I noticed there are 'todo' placeholders in the code and some discrepancies between function definitions and their usage.How complete are GPTQ, AWQ, and HQQ?

wejoncy commented 7 months ago

Yeah, It's almost done in quantization functionality. Some TODOs are for code-clean/refactor. My next plan would be to support other quantization algorithms if avaiable.