Open momo1986 opened 2 months ago
A100 GPU detected, using flash attention if input tensor is on cuda
D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\attend.py:88: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
out = F.scaled_dot_product_attention(
D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\attend.py:88: UserWarning: Memory efficient kernel not used because: (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:415.)
out = F.scaled_dot_product_attention(
D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\attend.py:88: UserWarning: Memory Efficient attention has been runtime disabled. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen/native/transformers/sdp_utils_cpp.h:456.)
out = F.scaled_dot_product_attention(
D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\attend.py:88: UserWarning: Flash attention kernel not used because: (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:417.)
out = F.scaled_dot_product_attention(
0%| | 0/700000 [02:43<?, ?it/s]
Traceback (most recent call last):
File "D:\denoising-diffusion-pytorch-main\test.py", line 32, in
如果输入张量位于 cuda 上,则检测到 A100 GPU,使用 Flash 注意 D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\attend.py:88:UserWarning:1Torch 未使用 Flash 注意进行编译。(在 C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263 内部触发。 out = F.scaled_dot_product_attention( D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\attend.py:88: UserWarning: 未使用内存高效内核,因为:(在 C:\actions-runner_work\pytorch\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:415 内部触发。 out = F.scaled_dot_product_attention( D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\attend.py:88: UserWarning: 内存效率注意已在运行时禁用。(在 C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen/native/transformers/sdp_utils_cpp.h:456 内部触发。 out = F.scaled_dot_product_attention( D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\attend.py:88: UserWarning: Flash 注意内核未使用,因为:(在 C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:417 内部触发。 out = F.scaled_dot_product_attention( 0%| | 0/700000 [02:43<?, ?it/s] 回溯(最近调用最后):文件 “D:\denoising-diffusion-pytorch-main\test.py”,第 32 行,在 trainer.train() 文件中 “D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\denoising_diffusion_pytorch.py”,第 1058 行,在 train loss = self.model(data)文件“D:\condaa312\envs\ddp\lib\site-packages\torch\nn\modules\module.py”,第 1511 行,_wrapped_call_impl返回self._call_impl(*args, kwargs)文件“D:\condaa312\envs\ddp\lib\site-packages\torch\nn\modules\module.py”,第 1520 行,_call_impl返回forward_call(*args, *kwargs) 文件“D:\condaa312\envs\ddp\lib\site-packages\accelerate\utils\operations.py”,第 820 行, 在正向返回 model_forward(args, kwargs) 文件中 “D:\condaa312\envs\ddp\lib\site-packages\accelerate\utils\operations.py”,第 808 行,在调用返回 convert_to_fp32(self.model_forward(*args, kwargs)) 文件中 “D:\condaa312\envs\ddp\lib\site-packages\torch\amp\autocast_mode.py”,第 16 行,在 decorate_autocast 中返回 func(*args, *kwargs)文件“D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\denoising_diffusion_pytorch.py”,第 841 行,正向返回self.p_losses(img, t, args, kwargs)文件“D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\denoising_diffusion_pytorch.py”,第 817 行,p_losses model_out = self.model(x, t, x_self_cond) 文件“D:\condaa312\envs\ddp\lib\site-packages\torch\nn\modules\module.py”,第 1511 行,_wrapped_call_impl返回 self._call_impl(*args, kwargs) 文件 “D:\condaa312\envs\ddp\lib\site-packages\torch\nn\modules\module.py”,第 1520 行,_call_impl 返回 forward_call(*args, *kwargs) 文件 “D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\denoising_diffusion_pytorch.py“,第 411 行,转发 x = attn(x) + x 文件”D:\condaa312\envs\ddp\lib\site-packages\torch\nn\modules\module.py“,第 1511 行,_wrapped_call_impl返回 self._call_impl(args, kwargs)文件”D:\condaa312\envs\ddp\lib\site-packages\torch\nn\modules\module.py“,第 1520 行,_call_impl返回forward_call(*args, kwargs)文件“D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\denoising_diffusion_pytorch.py”,第 269 行,传入转发 = self.attend(q, k, v) 文件“D:\condaa312\envs\ddp\lib\site-packages\torch\nn\modules\module.py”,第 1511 行,_wrapped_call_impl返回self._call_impl(*args, *kwargs) 文件“D:\condaa312\envs\ddp\lib\site-packages\torch\nn\modules\module.py”,第 1520 行,_call_impl返回 forward_call(args, kwargs) 文件 “D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\attend.py”,第 107 行,正向返回 self.flash_attn(q, k, v) 文件 “D:\denoising-diffusion-pytorch-main\denoising_diffusion_pytorch\attend.py”,第 88 行,输入 flash_attn 输出 = F.scaled_dot_product_attention( RuntimeError:无可用内核。正在中止执行。 我遇到了这个问题,不知道应该如何解决
I have encountered the same problem, have you solved it?
I have encountered the same problem, have you solved it?
i havnt solve it,i found something is not supported in 3090,some mechanism pytorch what。。emmm ,maybe u need a100.。。。
The machine is in 3090 platform.
Looks like that the pytorch version would be limited.
Is there any workaround for this version?
Thanks & Regards!