microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
19.57k stars 2.5k forks source link

CUDA device-side assert error #235

Open varshaneya opened 4 years ago

varshaneya commented 4 years ago

Model I am using : LayoutLM

During the training, I am getting CUDA device side assert error while performing the forward pass. Attached below is the screenshot of the error:

image

During the evaluation, I am getting CUDA device assert error when I am trying to move data from CPU to GPU. Attached below is screenshot of the error:

image

Pytorch version: 1.6.0 CUDA toolkit: 10.1 OS: Ubuntu 16.04 Python: 3.6.10 Anaconda

Can somebody help me out as to why is this error coming up?

wolfshow commented 4 years ago

@varshaneya, which GPU do you use?

varshaneya commented 4 years ago

@wolfshow I use NVIDIA-DGX.

ruifcruz commented 3 years ago

@varshaneya this error message is quite vague. When I get it I try to run on CPU and then the error message is more clear

sreejith3534 commented 3 years ago

Model I am using : LayoutLM

During the training, I am getting CUDA device side assert error while performing the forward pass. Attached below is the screenshot of the error:

image

During the evaluation, I am getting CUDA device assert error when I am trying to move data from CPU to GPU. Attached below is screenshot of the error:

image

Pytorch version: 1.6.0 CUDA toolkit: 10.1 OS: Ubuntu 16.04 Python: 3.6.10 Anaconda

Can somebody help me out as to why is this error coming up?

try running the same script with CUDA_LAUNCH_BLOCKING=1 at the start, it will give us more information on the actual issue.

jyotiyadav94 commented 2 years ago

@varshaneya , where u able to resolve the above issue ?