RuntimeError: CUDA error: device-side assert triggered Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

huawei-noah / Efficient-Computing

Efficient computing methods developed by Huawei Noah's Ark Lab

1.21k stars 211 forks source link

RuntimeError: CUDA error: device-side assert triggered Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. #76

Closed gf-zhong closed 1 year ago

gf-zhong commented 1 year ago

I am using a single GPU and encountered an error during training， After searching online, it was found that there may be a problem with the Cuda version. Have you ever encountered this problem? May I ask if you can provide information on the environment version?thanks！

gf-zhong commented 1 year ago

When I set device=CPU, I encountered the same error as them， https://github.com/huawei-noah/Efficient-Computing/issues/68 I have made the following changes

gf-zhong commented 1 year ago

When I initialize loss in the Trainer class of Gold YOLO yolov6 core engine. py_ Item, can already train normally, but I'm not sure if this will cause other problems

lose4578 commented 1 year ago

Hi, this is my env version: torch 1.11.0+cu113 cuda 11.3

I didn't see the same error, maybe your env had some problems.

gf-zhong commented 1 year ago

你好，这是我的环境版本： torch 1.11.0+cu113 cuda 11.3

我没有看到同样的错误，也许你的环境有一些问题。

Yes, there may be some issues with the environment. My Pytorch version is 2.0.0 and Cuda version is 11.7. However, after the above initialization, this issue can be resolved and can be trained and tested normally.