jrzaurin / pytorch-widedeep

A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch
Apache License 2.0
1.3k stars 190 forks source link

CUDA error: device-side assert triggered #186

Closed LinXin04 closed 1 year ago

LinXin04 commented 1 year ago

RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

when i run 17_Usign_a_hugging_face_model.ipynb, i got the error

jrzaurin commented 1 year ago

Hey @LinXin04...ok, I never seen that error, let me run it in an EC2 GPU when I have a sec and I will report back

jrzaurin commented 1 year ago

@5uperpalo or @kd1510 if you have a sec and want to fight with CUDA maybe you want to have a look at this

kd1510 commented 1 year ago

WHAT IS A CUDA ERROR: DEVICE-SIDE ASSERT TRIGGERED?

A CUDA error: device-side assert triggered is an error that’s often caused when you either have inconsistency between the number of labels and output units or you input the loss function incorrectly. To solve it, you need to make sure your output units match the number of classes and that your output layer returns values in the range of the loss function (criterion) that you chose.

Can you confirm this isn't the cause?

jrzaurin commented 1 year ago

@LinXin04 is your target a multi-class and if so, is it starting with 0?

LinXin04 commented 1 year ago

thanks. it is a multi-class, and i don't start with 0