Closed LinXin04 closed 1 year ago
Hey @LinXin04...ok, I never seen that error, let me run it in an EC2 GPU when I have a sec and I will report back
@5uperpalo or @kd1510 if you have a sec and want to fight with CUDA maybe you want to have a look at this
WHAT IS A CUDA ERROR: DEVICE-SIDE ASSERT TRIGGERED?
A CUDA error: device-side assert triggered is an error that’s often caused when you either have inconsistency between the number of labels and output units or you input the loss function incorrectly. To solve it, you need to make sure your output units match the number of classes and that your output layer returns values in the range of the loss function (criterion) that you chose.
Can you confirm this isn't the cause?
@LinXin04 is your target a multi-class and if so, is it starting with 0?
thanks. it is a multi-class, and i don't start with 0
RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.when i run 17_Usign_a_hugging_face_model.ipynb, i got the error