qwopqwop200 / GPTQ-for-LLaMa

4 bits quantization of LLaMA using GPTQ
Apache License 2.0
2.98k stars 457 forks source link

inference with the saved model error: AttributeError: module 'torch.backends.cuda' has no attribute 'sdp_kernel' #271

Open LuciaIsFine opened 1 year ago

LuciaIsFine commented 1 year ago

Loading model ... Found 3 unique KN Linear values. Warming up autotune cache ... 100%|█████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:34<00:00, 2.85s/it] Found 1 unique fused mlp KN values. Warming up autotune cache ... 100%|█████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:17<00:00, 1.45s/it] Done. Traceback (most recent call last): File "llama_inference.py", line 120, in generated_ids = model.generate( File "/opt/conda/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, kwargs) File "/opt/conda/lib/python3.8/site-packages/transformers/generation/utils.py", line 1485, in generate return self.sample( File "/opt/conda/lib/python3.8/site-packages/transformers/generation/utils.py", line 2524, in sample outputs = self( File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, *kwargs) File "/opt/conda/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 687, in forward outputs = self.model( File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "/opt/conda/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 577, in forward layer_outputs = decoder_layer( File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, *kwargs) File "/opt/conda/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 292, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, **kwargs) File "/workspace/luzijia/GPTQ-for-LLaMa-triton/quant/fused_attn.py", line 154, in forward with torch.backends.cuda.sdp_kernel(enable_math=False): AttributeError: module 'torch.backends.cuda' has no attribute 'sdp_kernel'

lyhaha2020 commented 10 months ago

I meet the same error, have you solve it? Is this a problem with the torch version?

TingxunShi commented 9 months ago

I meet the same error, have you solve it? Is this a problem with the torch version?

I got the same error in PyTorch 1.12.1. After I updated to 2.0.1 it's gone