Closed mengniwang95 closed 1 week ago
@libinta @sywangyi please review. Thx
LGTM. Could you also share the model that you tested for this? (GPTQ quantized qwen model)
LGTM. Could you also share the model that you tested for this? (GPTQ quantized qwen model)
I generated quantized Qwen2 model with this link: https://github.com/intel/neural-compressor/tree/master/examples/3.x_api/pytorch/nlp/huggingface_models/language-modeling/quantization/weight_only#quantization-cpu--hpu
The code quality check failed, please run make style
.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
Support loading 4 bit quantized Qwen2
Error log: File "/home/mewang/workspace/optimum-habana/examples/text-generation/run_generation.py", line 758, in
main()
File "/home/mewang/workspace/optimum-habana/examples/text-generation/run_generation.py", line 523, in main
generate(None, args.reduce_recompile)
File "/home/mewang/workspace/optimum-habana/examples/text-generation/run_generation.py", line 494, in generate
outputs = model.generate(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, kwargs)
File "/home/mewang/workspace/optimum-habana/optimum/habana/transformers/generation/utils.py", line 1417, in generate
result = self._sample(
File "/home/mewang/workspace/optimum-habana/optimum/habana/transformers/generation/utils.py", line 2396, in _sample
outputs = self(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1556, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1565, in _call_impl
return forward_call(args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/graphs.py", line 726, in forward
return wrapped_hpugraph_forward(
File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/graphs.py", line 599, in wrapped_hpugraph_forward
outputs = orig_fwd(*args, kwargs)
File "/home/mewang/workspace/optimum-habana/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 793, in forward
outputs = self.model(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1556, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1606, in _call_impl
result = forward_call(args, kwargs)
File "/home/mewang/workspace/optimum-habana/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 699, in forward
layer_outputs = decoder_layer(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1556, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1606, in _call_impl
result = forward_call(args, **kwargs)
File "/home/mewang/workspace/optimum-habana/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 467, in forward
hidden_states, self_attn_weights, present_key_value = self.pre_attn(
File "/home/mewang/workspace/optimum-habana/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 518, in pre_attn
hidden_states, attn_weights, present_key_value = self.self_attn.pre_attn_forward(
File "/home/mewang/workspace/optimum-habana/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 319, in pre_attn_forward
past_key = torch.zeros(key_states.shape, dtype=self.k_proj.weight.dtype, device=key_states.device)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1732, in getattr
raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'")
AttributeError: 'HPUWeightOnlyLinear' object has no attribute 'weight'. Did you mean: 'qweight'?