Xformers attention backwards pass not working in diffusers | RuntimeError: p.gQ_strideM() == grad_q.stride(1) INTERNAL ASSERT FAILED at "*/mem_eff_attention/attention_backward_generic.cu":180, #1261
I've been trying to use xformers for training in dreambooth, as well as training with waifu diffusion. It crashes before finishing even 1 iteration, with the message in the logs in both instances.
Xformers appears to be working perfectly fine for inference,
100%|███████████████████████████████████████████| 51/51 [00:03<00:00, 13.33it/s] pipe.enable_xformers_memory_efficient_attention() image = pipe(prompt).images[0] 100%|███████████████████████████████████████████| 51/51 [00:02<00:00, 19.49it/s]
and running xformers memory efficient benchmark works with no issue in the same environment as well.
I've tried boatloads of different python/pytorch/cuda/xformers configurations, but nothing appears to make this work.
I have recieved another report of the same error occuring on a rented server with completely different hardware except for the 3090, leading me to believe this might be a general issue for this specific GPU.
Sorry if this bug report is a bit of a mess, this issue has been haunting me for a few days.
Reproduction
Run dreambooth with
unet.set_use_memory_efficient_attention_xformers(True)
and a 3090.
Logs
Steps: 0%| | 0/800 [00:00<?, ?it/s]Traceback (most recent call last):
File "train_dreambooth.py", line 670, in <module>
main(args)
File "train_dreambooth.py", line 626, in main
accelerator.backward(loss)
File "/home/bunny/miniconda3/envs/xformers/lib/python3.8/site-packages/accelerate/accelerator.py", line 1007, in backward
loss.backward(**kwargs)
File "/home/bunny/miniconda3/envs/xformers/lib/python3.8/site-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/home/bunny/miniconda3/envs/xformers/lib/python3.8/site-packages/torch/autograd/__init__.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/home/bunny/miniconda3/envs/xformers/lib/python3.8/site-packages/torch/autograd/function.py", line 267, in apply
return user_fn(self, *args)
File "/home/bunny/miniconda3/envs/xformers/lib/python3.8/site-packages/torch/utils/checkpoint.py", line 157, in backward
torch.autograd.backward(outputs_with_grad, args_with_grad)
File "/home/bunny/miniconda3/envs/xformers/lib/python3.8/site-packages/torch/autograd/__init__.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/home/bunny/miniconda3/envs/xformers/lib/python3.8/site-packages/torch/autograd/function.py", line 267, in apply
return user_fn(self, *args)
File "/home/bunny/miniconda3/envs/xformers/lib/python3.8/site-packages/xformers/ops.py", line 369, in backward
) = torch.ops.xformers.efficient_attention_backward_cutlass(
File "/home/bunny/miniconda3/envs/xformers/lib/python3.8/site-packages/torch/_ops.py", line 442, in __call__
return self._op(*args, **kwargs or {})
RuntimeError: p.gQ_strideM() == grad_q.stride(1) INTERNAL ASSERT FAILED at "/home/runner/work/xfromers_builds/xfromers_builds/xformers/xformers/components/attention/csrc/cuda/mem_eff_attention/attention_backward_generic.cu":180, please report a bug to PyTorch.
Steps: 0%| | 0/800 [00:01<?, ?it/s]
Traceback (most recent call last):
File "/home/bunny/miniconda3/envs/xformers/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/home/bunny/miniconda3/envs/xformers/lib/python3.8/site-packages/accelerate/commands/accelerate_cli.py", line 43, in main
args.func(args)
File "/home/bunny/miniconda3/envs/xformers/lib/python3.8/site-packages/accelerate/commands/launch.py", line 910, in launch_command
simple_launcher(args)
File "/home/bunny/miniconda3/envs/xformers/lib/python3.8/site-packages/accelerate/commands/launch.py", line 400, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/bunny/miniconda3/envs/xformers/bin/python', 'train_dreambooth.py', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--instance_data_dir=./dog', '--output_dir=./dbout', '--instance_prompt=a photo of sks dog', '--resolution=512', '--train_batch_size=1', '--gradient_accumulation_steps=2', '--gradient_checkpointing', '--use_8bit_adam', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--max_train_steps=800']' returned non-zero exit status 1.
Describe the bug
I've been trying to use xformers for training in dreambooth, as well as training with waifu diffusion. It crashes before finishing even 1 iteration, with the message in the logs in both instances.
Xformers appears to be working perfectly fine for inference,
100%|███████████████████████████████████████████| 51/51 [00:03<00:00, 13.33it/s] pipe.enable_xformers_memory_efficient_attention() image = pipe(prompt).images[0] 100%|███████████████████████████████████████████| 51/51 [00:02<00:00, 19.49it/s]
and running xformers memory efficient benchmark works with no issue in the same environment as well. I've tried boatloads of different python/pytorch/cuda/xformers configurations, but nothing appears to make this work. I have recieved another report of the same error occuring on a rented server with completely different hardware except for the 3090, leading me to believe this might be a general issue for this specific GPU.
Sorry if this bug report is a bit of a mess, this issue has been haunting me for a few days.
Reproduction
Run dreambooth with
unet.set_use_memory_efficient_attention_xformers(True)
and a 3090.Logs
System Info
diffusers
version: 0.7.2