Open RealCalumPlays opened 1 year ago
same error here on Tesla V100-SXM2-32GB
There is a choice of three kernels:
torch.backends.cuda.sdp_kernel(enable_flash=True, enable_math=False, enable_mem_efficient=False
Currently, only flash attention is on. Try enabling the other options as well.
same error here on Tesla V100-SXM2-32GB
Same issue for me as well on the same machine, with below details: OS: Ubuntu 18.04.5 LTS Libs:
bitsandbytes==0.39.0
transformers==4.29.2
triton==2.0.0
sentencepiece==0.1.99
datasets==2.12.0
peft==0.3.0
torch==2.0.1+cu118
accelerate==0.19.0
safetensors==0.3.1
einops==0.6.1
wandb==0.15.3
bitsandbytes==0.39.0
scipy==1.10.1
There is a choice of three kernels:
torch.backends.cuda.sdp_kernel(enable_flash=True, enable_math=False, enable_mem_efficient=False
Currently, only flash attention is on. Try enabling the other options as well.
Doing this giving the below error:
Traceback (most recent call last):
File "falcontune/run.py", line 93, in <module>
main()
File "falcontune/run.py", line 89, in main
args.func(args)
File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/falcontune-0.1.0-py3.8.egg/falcontune/finetune.py", line 162, in fin
etune
trainer.train()
File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/transformers/trainer.py", line 1664, in train
return inner_training_loop(
File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/transformers/trainer.py", line 1940, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/transformers/trainer.py", line 2735, in training_step
loss = self.compute_loss(model, inputs)
File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/transformers/trainer.py", line 2767, in compute_loss
outputs = model(**inputs)
File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/peft/peft_model.py", line 678, in forward
return self.base_model(
File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/falcontune-0.1.0-py3.8.egg/falcontune/model/falcon/model.py", line 1070, in forward
transformer_outputs = self.transformer(
File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/falcontune-0.1.0-py3.8.egg/falcontune/model/falcon/model.py", line 965, in forward
outputs = block(
File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/falcontune-0.1.0-py3.8.egg/falcontune/model/falcon/model.py", line 634, in forward
attn_outputs = self.self_attention(
File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/falcontune-0.1.0-py3.8.egg/falcontune/model/falcon/model.py", line 486, in forward
fused_qkv = self.query_key_value(hidden_states) # [batch_size, seq_length, 3 x hidden_size]
File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/falcontune-0.1.0-py3.8.egg/falcontune/model/lora.py", line 54, in forward
result = self.quant_class.forward(self, x)
File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/falcontune-0.1.0-py3.8.egg/falcontune/backend/triton/quantlinear.py", line 13, in forward
out = AutogradMatmul.apply(
File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/torch/cuda/amp/autocast_mode.py", line 106, in decorate_fwd
return fwd(*args, **kwargs)
File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/falcontune-0.1.0-py3.8.egg/falcontune/backend/triton/autograd.py", line 11, in forward
output = tu.triton_matmul(x, qweight, scales, qzeros, g_idx, bits, maxq)
File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/falcontune-0.1.0-py3.8.egg/falcontune/backend/triton/triton_utils.py", line 246, in triton_matmul
matmul_248_kernel[grid](input, qweight, output,
File "/home/users/user/falcontune/venv_falcontune/lib/python3.8/site-packages/falcontune-0.1.0-py3.8.egg/falcontune/backend/triton/custom_autotune.py", line 110, in run
return self.fn.run(*args, num_warps=config.num_warps, num_stages=config.num_stages, **kwargs, **config.kwargs)
File "<string>", line 24, in matmul_248_kernel
ValueError: Pointer argument (at 1) cannot be accessed from Triton (cpu tensor?)
I was having this same issue on google colab v100, switching to a100 fixed it for me.
Any fix for this? I'm still getting this issue.
In V100, we need enable the mem_efficient mode, it doesn't support native flash attention.
--- a/falcontune/model/falcon/model.py
+++ b/falcontune/model/falcon/model.py
@@ -523,7 +523,7 @@ class Attention40B(nn.Module):
key_layer_ = key_layer.reshape(batch_size, self.num_heads, -1, self.head_dim)
value_layer_ = value_layer.reshape(batch_size, self.num_heads, -1, self.head_dim)
- with torch.backends.cuda.sdp_kernel(enable_flash=True, enable_math=False, enable_mem_efficient=False):
+ with torch.backends.cuda.sdp_kernel(enable_flash=True, enable_math=False, enable_mem_efficient=True):
attn_output = F.scaled_dot_product_attention(
query_layer_, key_layer_, value_layer_, None, 0.0, is_causal=True
)
Any ideas? Full log below:
Traceback (most recent call last): File "/home/cosmos/miniconda3/envs/ftune/bin/falcontune", line 33, in
sys.exit(load_entry_point('falcontune==0.1.0', 'console_scripts', 'falcontune')())
File "/home/cosmos/miniconda3/envs/ftune/lib/python3.10/site-packages/falcontune-0.1.0-py3.10.egg/falcontune/run.py", line 87, in main
args.func(args)
File "/home/cosmos/miniconda3/envs/ftune/lib/python3.10/site-packages/falcontune-0.1.0-py3.10.egg/falcontune/finetune.py", line 162, in finetune
trainer.train()
File "/home/cosmos/miniconda3/envs/ftune/lib/python3.10/site-packages/transformers/trainer.py", line 1664, in train
return inner_training_loop(
File "/home/cosmos/miniconda3/envs/ftune/lib/python3.10/site-packages/transformers/trainer.py", line 1940, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/cosmos/miniconda3/envs/ftune/lib/python3.10/site-packages/transformers/trainer.py", line 2735, in training_step
loss = self.compute_loss(model, inputs)
File "/home/cosmos/miniconda3/envs/ftune/lib/python3.10/site-packages/transformers/trainer.py", line 2767, in compute_loss
outputs = model(inputs)
File "/home/cosmos/miniconda3/envs/ftune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, *kwargs)
File "/home/cosmos/miniconda3/envs/ftune/lib/python3.10/site-packages/peft/peft_model.py", line 678, in forward
return self.base_model(
File "/home/cosmos/miniconda3/envs/ftune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, kwargs)
File "/home/cosmos/miniconda3/envs/ftune/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, kwargs)
File "/home/cosmos/miniconda3/envs/ftune/lib/python3.10/site-packages/falcontune-0.1.0-py3.10.egg/falcontune/model/falcon/model.py", line 1070, in forward
transformer_outputs = self.transformer(
File "/home/cosmos/miniconda3/envs/ftune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, *kwargs)
File "/home/cosmos/miniconda3/envs/ftune/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(args, kwargs)
File "/home/cosmos/miniconda3/envs/ftune/lib/python3.10/site-packages/falcontune-0.1.0-py3.10.egg/falcontune/model/falcon/model.py", line 965, in forward
outputs = block(
File "/home/cosmos/miniconda3/envs/ftune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, kwargs)
File "/home/cosmos/miniconda3/envs/ftune/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, *kwargs)
File "/home/cosmos/miniconda3/envs/ftune/lib/python3.10/site-packages/falcontune-0.1.0-py3.10.egg/falcontune/model/falcon/model.py", line 698, in forward
attn_outputs = self.self_attention(
File "/home/cosmos/miniconda3/envs/ftune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, kwargs)
File "/home/cosmos/miniconda3/envs/ftune/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/cosmos/miniconda3/envs/ftune/lib/python3.10/site-packages/falcontune-0.1.0-py3.10.egg/falcontune/model/falcon/model.py", line 337, in forward
attn_output = F.scaled_dot_product_attention(
RuntimeError: No available kernel. Aborting execution.
EDIT: CUDA is installed in kernel modules, on the system & in the environment just to rule out that. Using python 3.10.6