Closed FlexLaughing closed 2 months ago
seems the model structure look like the same, I am trying to use qllm/quantization/sequential_layes_awq_config.py get_llama_layers method to quantify Qwen2,it works and generate int4 models, but seems infer result is not correct for int4 models.
seems the model structure look like the same, I am trying to use qllm/quantization/sequential_layes_awq_config.py get_llama_layers method to quantify Qwen2,it works and generate int4 models, but seems infer result is not correct for int4 models.
Yeah, It would be great if you got some time to debug the issue. I have to work on other things these days.
seems the model structure look like the same, I am trying to use qllm/quantization/sequential_layes_awq_config.py get_llama_layers method to quantify Qwen2,it works and generate int4 models, but seems infer result is not correct for int4 models.
Yeah, It would be great if you got some time to debug the issue. I have to work on other things these days.
Thanks,I will try to debug the inference result for Qwen1.0 and Qwen2 quantazation models.
Hi wejoncy, I am trying to quantize Qwen2-7B model with repo,met above issue,do we support the qwen2 for awq int4 quantization? dataloader = torch.load(cache_dir) Starting ... Ready. Running AWQ...: 0%| | 0/28 [00:00<?, ?it/s] Traceback (most recent call last): File "/home/ubuntu/miniconda3_debug/envs/qllm_export_onnx/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/ubuntu/miniconda3_debug/envs/qllm_export_onnx/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/ubuntu/data/awq_quan_demo/QLLM-0.2.0/qllm/main.py", line 6, in
main()
File "/home/ubuntu/data/awq_quan_demo/QLLM-0.2.0/qllm/run.py", line 78, in main
model_quanter.run(args)
File "/home/ubuntu/data/awq_quan_demo/QLLM-0.2.0/qllm/auto_model_quantization.py", line 227, in run
quantizers = self.__dispatch_quant(model, inputs_dataloader, config, "cuda")
File "/home/ubuntu/miniconda3_debug/envs/qllm_export_onnx/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, kwargs)
File "/home/ubuntu/data/awq_quan_demo/QLLM-0.2.0/qllm/auto_model_quantization.py", line 44, in dispatch_quant
return quantizer.quantize(model, inputs_dataloader, dev)
File "/home/ubuntu/data/awq_quan_demo/QLLM-0.2.0/qllm/quantization/quant_frame_base.py", line 119, in quantize
quantizers.update(self.do_quantize(model, dataloader, prefix, dev))
File "/home/ubuntu/miniconda3_debug/envs/qllm_export_onnx/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/ubuntu/data/awq_quan_demo/QLLM-0.2.0/qllm/quantization/quant_awq.py", line 125, in do_quantize
in_quantizer.fast_quant_layer(layer_kwargs, input_feat, layer, attention_layers, i, model.class.name__)
File "/home/ubuntu/data/awq_quan_demo/QLLM-0.2.0/qllm/quantization/_awq_quantizer.py", line 383, in fast_quant_layer
scales_list = self.auto_scale_block(layer, layer_kwargs, input_feat=input_feat, model_type=model_type)
File "/home/ubuntu/miniconda3_debug/envs/qllm_export_onnx/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, kwargs)
File "/home/ubuntu/data/awq_quan_demo/QLLM-0.2.0/qllm/quantization/_awq_quantizer.py", line 374, in auto_scale_block
sub_modules = get_model_specific_quant_layer(
File "/home/ubuntu/data/awq_quan_demo/QLLM-0.2.0/qllm/quantization/_awq_quantizer.py", line 10, in get_model_specific_quant_layer
scales_list = auto_detect_sequential_layers(module, input_feat, model_type, module_kwargs)
File "/home/ubuntu/data/awq_quan_demo/QLLM-0.2.0/qllm/quantization/sequential_layes_awq_config.py", line 629, in auto_detect_sequential_layers
assert model_type in true_sequential_layers_for_model, f"{model_type} is not support"
AssertionError: Qwen2ForCausalLM is not support