isht7 / pytorch-deeplab-resnet

DeepLab resnet v2 model in pytorch
MIT License
602 stars 118 forks source link

Issue while evaluating trained model #8

Closed sequae92 closed 7 years ago

sequae92 commented 7 years ago

RuntimeError: sizes do not match at /b/wheel/pytorch-src/torch/lib/THC/THCTensorCopy.cu:31

I have finetuning this model to train on my custom dataset of images. The groundtruth has only two labels [0 and 255]. However when I test my image using the eval2.py script, I get the following error:

{'--gpu0': '0', '--help': False, '--snapPrefix': 'VOC12scenes', '--testGTpath': '/mnt/VTSRAID01/SAMI_EXPERIMENTS/Segmentation/DataForAnalytics/GIS_ALL_IMAGES/BinaryResizedGroundtruthPng/', '--testIMpath': '/mnt/VTSRAID01/SAMI_EXPERIMENTS/Segmentation/DataForAnalytics/GIS_ALL_IMAGES/ResizedOriginalImages/', '--visualize': True} VOC12scenes Traceback (most recent call last): File "evalpyt2.py", line 87, in model.load_state_dict(saved_state_dict) File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 335, in load_state_dict ownstate[name].copy(param) RuntimeError: sizes do not match at /b/wheel/pytorch-src/torch/lib/THC/THCTensorCopy.cu:31

I have crosschecked and the sizes of my input image and the groundtruth do match. I am not sure what is causing this error.

Any help would be much appreciated.

isht7 commented 7 years ago

Thank you for your report. I have fixed the possible cause of this error. Please pull this file and run again. If you encounter an error, please report. Also, dont forget to use the --NoLabels flag. As a separate note, you cannot use 2 labels as 0 and 255. Your labels must be contiguous, that is 0,1 if you have only 2 labels(otherwise I expect an error when you train using train.py, as the loss function expects labels to be contiguous). Be aware that in evalpyt2.py, label 255 is merged with 0 (see here) because originally in VOC, 255 label is labeled as background. You may want to remove that line.

isht7 commented 7 years ago

Update: I have removed the line which merges label 0 with 255 just now from evalpyt2.py as this was not done by authors of the deeplab paper originally.

sequae92 commented 7 years ago

Thank you for such a prompt response. I was able to bypass that error after incorporating your changes.

I have a few more clarifications with regard to the same:

  1. I was able to train with image labels , 0 and 255, and did not come across any error. Is there any explicit condition which constrains training to continuous labels only ? Also if yes, how can that be bypassed? In my use case, I am provided groundtruth with image labels as binary only [0 and 255 in this case]

  2. Also I wanted to clarify what is the rationale behind converting the images into (513,513,3) in the evalpyt2.py. Any such conversion is not done while training on the dataset.

I am facing this error now _img[:img_temp.shape[0],:img_temp.shape[1],:] = imgtemp ValueError: could not broadcast input array from shape (1680,1224,3) into shape (513,513,3)

Thanks for your time.

sequae92 commented 7 years ago

Update: Nevermind the 2nd question, I have rectified it on my side.

I receive this error now: Not much documentation online on how to resolve this: {'--NoLabels': '2', '--gpu0': '0', '--help': False, '--snapPrefix': 'VOC12scenes', '--testGTpath': '/mnt/VTSRAID01/SAMI_EXPERIMENTS/Segmentation/DataForAnalytics/GIS_ALL_IMAGES/TestImageGroundtruth/', '--testIMpath': '/mnt/VTSRAID01/SAMI_EXPERIMENTS/Segmentation/DataForAnalytics/GIS_ALL_IMAGES/TestImage/', '--visualize': True} VOC12scenes Traceback (most recent call last): File "evalpyt2.py", line 106, in output = model(Variable(torch.from_numpy(img[np.newaxis, :].transpose(0,3,1,2)).float(),volatile = True).cuda(gpu0)) File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 206, in call result = self.forward(*input, kwargs) File "/home/spondon/testbed/segmentation/pytorch-deeplab-resnet/deeplab_resnet.py", line 204, in forward temp1 = torch.max(out[0],x2Out_interp) File "/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py", line 447, in max return Cmax()(self, dim) File "/usr/local/lib/python2.7/dist-packages/torch/autograd/_functions/pointwise.py", line 215, in forward self._max_buffer = a.gt(b).type_as(a) RuntimeError: sizes do not match at /b/wheel/pytorch-src/torch/lib/THC/generated/../THCTensorMathCompareT.cuh:65**

isht7 commented 7 years ago

Two points.

  1. You did not encounter an error because, during train time 255 is merged with 0.[see here]. I recommend that you change your labels from 255 to 1, and then train again. Now you are just training on images with labels fully 0. Because 255 is merged with 0, only 0 remains.
  2. As for the error you are getting, can you verify and tell what is the size of img just before it is passed into line 106? A square image must be passed to the model. Failure to do so would result in this error. Can you tell me what print img.shape outputs just before the line which produces this error?
isht7 commented 7 years ago

Also I wanted to clarify what is the rationale behind converting the images into (513,513,3) in the evalpyt2.py. Any such conversion is not done while training on the dataset.

This has been done by deeplab-resnet caffe during test time(have a look at their data layer in test.prototxt), and therefore we also did it. Because only square images can be sent, we pad the image so that the size becomes (513,513,3).

isht7 commented 7 years ago

If there are any additional issues, let me know.