Loss error during backward when using torch.nn.CrossEntropyLoss() with gpu

ivadomed / MEEG-Brainstorm

Repository for training MEEG datasets with the ivadomed framework in Brainstorm

1 stars 3 forks source link

Loss error during backward when using torch.nn.CrossEntropyLoss() with gpu #18

Closed ambroiseodt closed 2 years ago

ambroiseodt commented 2 years ago

I am currently trying to use a gpu (cuda:0) on rosenberg server to run my code; During training, the data are moved to the device cuda:0 but I have an error when doing the backward in the loss here https://github.com/AmbroiseOdonnat/MEEG-Brainstorm/blob/ao/seizure_classification/Train.py I use torch.nn.CrossEntropyLoss() and I have the following error. I precise that the code works perfectly on rosenberg and on my computer when I use a cpu. Capture d’écran 2022-03-08 à 09 18 12 )

ambroiseodt commented 2 years ago

Similar to this issue https://github.com/pytorch/pytorch/issues/56747 It seems to be an issue with CUDA 11.1 and linear layers / matrices multiplications. I downgraded CUDA 11.1 to CUDA 10.2 and it works fine now.