lucidrains / denoising-diffusion-pytorch

Implementation of Denoising Diffusion Probabilistic Model in Pytorch
MIT License
8.46k stars 1.04k forks source link

Is there workaround for 3090 #344

Open momo1986 opened 2 months ago

momo1986 commented 2 months ago

The machine is in 3090 platform.

Looks like that the pytorch version would be limited.

Is there any workaround for this version?

Thanks & Regards!

dabensongbing commented 1 month ago

A100 GPU detected, using flash attention if input tensor is on cuda D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\attend.py:88: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.) out = F.scaled_dot_product_attention( D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\attend.py:88: UserWarning: Memory efficient kernel not used because: (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:415.) out = F.scaled_dot_product_attention( D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\attend.py:88: UserWarning: Memory Efficient attention has been runtime disabled. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen/native/transformers/sdp_utils_cpp.h:456.) out = F.scaled_dot_product_attention( D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\attend.py:88: UserWarning: Flash attention kernel not used because: (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:417.) out = F.scaled_dot_product_attention( 0%| | 0/700000 [02:43<?, ?it/s] Traceback (most recent call last): File "D:\denoising-diffusion-pytorch-main\test.py", line 32, in trainer.train() File "D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\denoising_diffusion_pytorch.py", line 1058, in train loss = self.model(data) File "D:\condaa312\envs\ddp\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "D:\condaa312\envs\ddp\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "D:\condaa312\envs\ddp\lib\site-packages\accelerate\utils\operations.py", line 820, in forward return model_forward(args, kwargs) File "D:\condaa312\envs\ddp\lib\site-packages\accelerate\utils\operations.py", line 808, in call return convert_to_fp32(self.model_forward(*args, kwargs)) File "D:\condaa312\envs\ddp\lib\site-packages\torch\amp\autocast_mode.py", line 16, in decorate_autocast return func(*args, *kwargs) File "D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\denoising_diffusion_pytorch.py", line 841, in forward return self.p_losses(img, t, args, kwargs) File "D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\denoising_diffusion_pytorch.py", line 817, in p_losses model_out = self.model(x, t, x_self_cond) File "D:\condaa312\envs\ddp\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "D:\condaa312\envs\ddp\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\denoising_diffusion_pytorch.py", line 411, in forward x = attn(x) + x File "D:\condaa312\envs\ddp\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "D:\condaa312\envs\ddp\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl return forward_call(*args, kwargs) File "D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\denoising_diffusion_pytorch.py", line 269, in forward out = self.attend(q, k, v) File "D:\condaa312\envs\ddp\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "D:\condaa312\envs\ddp\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\attend.py", line 107, in forward return self.flash_attn(q, k, v) File "D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\attend.py", line 88, in flash_attn out = F.scaled_dot_product_attention( RuntimeError: No available kernel. Aborting execution. 我遇到了这个问题,不知道应该如何解决

MADAO-King commented 1 week ago

如果输入张量位于 cuda 上,则检测到 A100 GPU,使用 Flash 注意 D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\attend.py:88:UserWarning:1Torch 未使用 Flash 注意进行编译。(在 C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263 内部触发。 out = F.scaled_dot_product_attention( D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\attend.py:88: UserWarning: 未使用内存高效内核,因为:(在 C:\actions-runner_work\pytorch\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:415 内部触发。 out = F.scaled_dot_product_attention( D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\attend.py:88: UserWarning: 内存效率注意已在运行时禁用。(在 C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen/native/transformers/sdp_utils_cpp.h:456 内部触发。 out = F.scaled_dot_product_attention( D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\attend.py:88: UserWarning: Flash 注意内核未使用,因为:(在 C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:417 内部触发。 out = F.scaled_dot_product_attention( 0%| | 0/700000 [02:43<?, ?it/s] 回溯(最近调用最后):文件 “D:\denoising-diffusion-pytorch-main\test.py”,第 32 行,在 trainer.train() 文件中 “D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\denoising_diffusion_pytorch.py”,第 1058 行,在 train loss = self.model(data)文件“D:\condaa312\envs\ddp\lib\site-packages\torch\nn\modules\module.py”,第 1511 行,_wrapped_call_impl返回self._call_impl(*args, kwargs)文件“D:\condaa312\envs\ddp\lib\site-packages\torch\nn\modules\module.py”,第 1520 行,_call_impl返回forward_call(*args, *kwargs) 文件“D:\condaa312\envs\ddp\lib\site-packages\accelerate\utils\operations.py”,第 820 行, 在正向返回 model_forward(args, kwargs) 文件中 “D:\condaa312\envs\ddp\lib\site-packages\accelerate\utils\operations.py”,第 808 行,在调用返回 convert_to_fp32(self.model_forward(*args, kwargs)) 文件中 “D:\condaa312\envs\ddp\lib\site-packages\torch\amp\autocast_mode.py”,第 16 行,在 decorate_autocast 中返回 func(*args, *kwargs)文件“D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\denoising_diffusion_pytorch.py”,第 841 行,正向返回self.p_losses(img, t, args, kwargs)文件“D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\denoising_diffusion_pytorch.py”,第 817 行,p_losses model_out = self.model(x, t, x_self_cond) 文件“D:\condaa312\envs\ddp\lib\site-packages\torch\nn\modules\module.py”,第 1511 行,_wrapped_call_impl返回 self._call_impl(*args, kwargs) 文件 “D:\condaa312\envs\ddp\lib\site-packages\torch\nn\modules\module.py”,第 1520 行,_call_impl 返回 forward_call(*args, *kwargs) 文件 “D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\denoising_diffusion_pytorch.py“,第 411 行,转发 x = attn(x) + x 文件”D:\condaa312\envs\ddp\lib\site-packages\torch\nn\modules\module.py“,第 1511 行,_wrapped_call_impl返回 self._call_impl(args, kwargs)文件”D:\condaa312\envs\ddp\lib\site-packages\torch\nn\modules\module.py“,第 1520 行,_call_impl返回forward_call(*args, kwargs)文件“D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\denoising_diffusion_pytorch.py”,第 269 行,传入转发 = self.attend(q, k, v) 文件“D:\condaa312\envs\ddp\lib\site-packages\torch\nn\modules\module.py”,第 1511 行,_wrapped_call_impl返回self._call_impl(*args, *kwargs) 文件“D:\condaa312\envs\ddp\lib\site-packages\torch\nn\modules\module.py”,第 1520 行,_call_impl返回 forward_call(args, kwargs) 文件 “D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\attend.py”,第 107 行,正向返回 self.flash_attn(q, k, v) 文件 “D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\attend.py”,第 88 行,输入 flash_attn 输出 = F.scaled_dot_product_attention( RuntimeError:无可用内核。正在中止执行。 我遇到了这个问题,不知道应该如何解决

MADAO-King commented 1 week ago

I have encountered the same problem, have you solved it?

dabensongbing commented 1 week ago

I have encountered the same problem, have you solved it?

i havnt solve it,i found something is not supported in 3090,some mechanism pytorch what。。emmm ,maybe u need a100.。。。