wejoncy / QLLM

A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ, and export to onnx/onnx-runtime easily.
Apache License 2.0
150 stars 15 forks source link

Qwen2 model quantization failed with "AssertionError: Qwen2ForCausalLM is not support" #137

Closed FlexLaughing closed 2 months ago

FlexLaughing commented 2 months ago

Hi wejoncy, I am trying to quantize Qwen2-7B model with repo,met above issue,do we support the qwen2 for awq int4 quantization? dataloader = torch.load(cache_dir) Starting ... Ready. Running AWQ...: 0%| | 0/28 [00:00<?, ?it/s] Traceback (most recent call last): File "/home/ubuntu/miniconda3_debug/envs/qllm_export_onnx/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/ubuntu/miniconda3_debug/envs/qllm_export_onnx/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/ubuntu/data/awq_quan_demo/QLLM-0.2.0/qllm/main.py", line 6, in main() File "/home/ubuntu/data/awq_quan_demo/QLLM-0.2.0/qllm/run.py", line 78, in main model_quanter.run(args) File "/home/ubuntu/data/awq_quan_demo/QLLM-0.2.0/qllm/auto_model_quantization.py", line 227, in run quantizers = self.__dispatch_quant(model, inputs_dataloader, config, "cuda") File "/home/ubuntu/miniconda3_debug/envs/qllm_export_onnx/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, kwargs) File "/home/ubuntu/data/awq_quan_demo/QLLM-0.2.0/qllm/auto_model_quantization.py", line 44, in dispatch_quant return quantizer.quantize(model, inputs_dataloader, dev) File "/home/ubuntu/data/awq_quan_demo/QLLM-0.2.0/qllm/quantization/quant_frame_base.py", line 119, in quantize quantizers.update(self.do_quantize(model, dataloader, prefix, dev)) File "/home/ubuntu/miniconda3_debug/envs/qllm_export_onnx/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) File "/home/ubuntu/data/awq_quan_demo/QLLM-0.2.0/qllm/quantization/quant_awq.py", line 125, in do_quantize in_quantizer.fast_quant_layer(layer_kwargs, input_feat, layer, attention_layers, i, model.class.name__) File "/home/ubuntu/data/awq_quan_demo/QLLM-0.2.0/qllm/quantization/_awq_quantizer.py", line 383, in fast_quant_layer scales_list = self.auto_scale_block(layer, layer_kwargs, input_feat=input_feat, model_type=model_type) File "/home/ubuntu/miniconda3_debug/envs/qllm_export_onnx/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, kwargs) File "/home/ubuntu/data/awq_quan_demo/QLLM-0.2.0/qllm/quantization/_awq_quantizer.py", line 374, in auto_scale_block sub_modules = get_model_specific_quant_layer( File "/home/ubuntu/data/awq_quan_demo/QLLM-0.2.0/qllm/quantization/_awq_quantizer.py", line 10, in get_model_specific_quant_layer scales_list = auto_detect_sequential_layers(module, input_feat, model_type, module_kwargs) File "/home/ubuntu/data/awq_quan_demo/QLLM-0.2.0/qllm/quantization/sequential_layes_awq_config.py", line 629, in auto_detect_sequential_layers assert model_type in true_sequential_layers_for_model, f"{model_type} is not support" AssertionError: Qwen2ForCausalLM is not support

FlexLaughing commented 2 months ago

seems the model structure look like the same, I am trying to use qllm/quantization/sequential_layes_awq_config.py get_llama_layers method to quantify Qwen2,it works and generate int4 models, but seems infer result is not correct for int4 models.

wejoncy commented 2 months ago

seems the model structure look like the same, I am trying to use qllm/quantization/sequential_layes_awq_config.py get_llama_layers method to quantify Qwen2,it works and generate int4 models, but seems infer result is not correct for int4 models.

Yeah, It would be great if you got some time to debug the issue. I have to work on other things these days.

FlexLaughing commented 2 months ago

seems the model structure look like the same, I am trying to use qllm/quantization/sequential_layes_awq_config.py get_llama_layers method to quantify Qwen2,it works and generate int4 models, but seems infer result is not correct for int4 models.

Yeah, It would be great if you got some time to debug the issue. I have to work on other things these days.

Thanks,I will try to debug the inference result for Qwen1.0 and Qwen2 quantazation models.