cuda runtime error - Githubissues

leehigh commented 6 years ago

hi , i'm new and i have a very basic question.

I run python train.py but I get error like this:

Traceback (most recent call last):
  File "train.py", line 232, in <module>
    loss = loss + loss_calc(out[i+1],label[i+1],gpu0)
  File "train.py", line 134, in loss_calc
    label = Variable(label).cuda(gpu0)
  File "/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py", line 279, in cuda
    return CudaTransfer.apply(self, device_id, async)
  File "/usr/local/lib/python2.7/dist-packages/torch/autograd/_functions/tensor.py", line 149, in forward
    return i.cuda(device_id, async=async)
  File "/usr/local/lib/python2.7/dist-packages/torch/_utils.py", line 66, in _cuda
    return new_type(self.size()).copy_(self, async)
RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/torch/lib/THC/generic/THCTensorCopy.c:18

I find this, https://github.com/pytorch/pytorch/issues/1010 but it doesn't help.

isht7 commented 6 years ago

I think this is caused due because some of labels are grater than max_label: ( following is pasted from readme):

Please note that labels should be denoted by contiguous values (starting from 0) in the ground truth images. For eg. if there are 7 (no_labels) different labels, then each ground truth image must have these labels as 0,1,2,3,...6 (no_labels-1). This is the the parameter corresponding to the maximum no. of labels. It is 21 by default.

isht7 commented 6 years ago

You may also look at #18 (read the later comments there)

isht7 / pytorch-deeplab-resnet

cuda runtime error #14