Open showfaker66 opened 2 years ago
Thank you for reporting! Could you please provide the full error trace? Thank you. (It is always ideal to have the CUDA_LAUNCH_BLOCKING=1 flag when running, so any low-CUDA errors shall be triggered)
Thank you for you reply! The complete error appears below. C:/cb/pytorch_1000000000000/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: block: [0,0,0], thread: [0,0,0] Assertion
t >= 0 && t < n_classes
failed. Traceback (most recent call last): File "train.py", line 274, intrain(args) File "train.py", line 174, in train trainloss, , _, train_acc = train_epoch(slrt_model, train_loader, cel_criterion, sgd_optimizer, device) File "I:\action_recognition\spoter-main-hand-sign\spoter\utils.py", line 25, in train_epoch loss.backward() File "D:\anaconda\envs\ctpgr\lib\site-packages\torch\tensor.py", line 245, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "D:\anaconda\envs\ctpgr\lib\site-packages\torch\autograd__init__.py", line 145, in backward Variable._execution_engine.run_backward( RuntimeError: CUDA error: device-side assert triggered
@matyasbohacek Hi have you resolved this error ? I am also getting the same error
I'm having the same problem, is there a solution for it?
Hey, I have found a solution!
Go to datasets/czech_slr_dataset.py
, and around line 105, find the following:
label = torch.Tensor([self.labels[idx] - 1])
That -1 is the cause of our problems, because while working with WLASL100, labels go from 0 to 99 and, as a result, when we call the class CzechSLRDataset
, we recieve something like tensor([[-1]])
, but there is no class labelled with -1. This explains the CUDA error and the t >= 0 & t < num_labels
.
Taking that into account, the following fix worked for me:
label = torch.Tensor([self.labels[idx]]) # Just drop the "-1"
Hope this helps! :D
RuntimeError: CUDA error: device-side assert triggered. ` for i, data in enumerate(dataloader): inputs, labels = data
inputs, labels = Variable(inputs), Variable(labels)-1