meetps / pytorch-semseg

Semantic Segmentation Architectures Implemented in PyTorch
https://meetshah.dev/semantic-segmentation/deep-learning/pytorch/visdom/2017/06/01/semantic-segmentation-over-the-years.html
MIT License
3.38k stars 799 forks source link

Size inconsistency in U-Net implementation. #43

Open xiaofengqing opened 6 years ago

xiaofengqing commented 6 years ago

When i train the unet model,i got this error: RuntimeError: inconsistent tensor sizes at/b/wheel/pytorchsrc/torch/lib/THC/generic/THCTensorMath.cu:141

my input image size is 256*256

shehabk commented 6 years ago

I also have the exact same issue. Can anyone help me out ?

hexiangquan commented 6 years ago

def getitem(self, index): img_name = self.files[self.split][index] img_path = self.root + '/' + self.split + '/' + img_name lbl_path = self.root + '/' + self.split + 'annot/' + img_name print img_path print lbl_path img = m.imread(img_path) img=m.imresize(img,[360, 480], interp='nearest') # add this line
img = np.array(img, dtype=np.uint8)

    lbl = m.imread(lbl_path)
    lbl=m.imresize(lbl,[360, 480], interp='nearest')    # add  this line  
    lbl = np.array(lbl, dtype=np.int32)
    print lbl.shape
shehabk commented 6 years ago

This resizing of image did not work for me. I still have the same error. Does this current implementation of unet work with (256,256) ? If not what size of image should be used ?

bobbqe commented 6 years ago

I have the same problem. Did anyone find the solution?

mileyan commented 6 years ago

The problem is that unet does not have any padding in the convolution layers. So output size is not equal to input size. But the label size = input size.

meetps commented 6 years ago

I'm aware of this issue, U-net implementation doesn't support all resolutions. I need to fix this.

masahi commented 6 years ago

setting padding to 1 instead of 0 worked for me.

JustWon commented 6 years ago

@masahi OMG.. you are the winner.. It works fine but I should see the result images after training.

JustWon commented 6 years ago

@masahi After training the unet, I performed the validate.py but the following error occurred.

image

masahi commented 6 years ago

@JustWon that error is not related to your change in padding. look elsewhere.

L0SG commented 6 years ago

Maybe late to the discussion, but since I've PR'd the u-net fix (#35), see issue #21), Here's my comments.

A strict U-net implementation does not use padding (Fig 1 in the https://arxiv.org/pdf/1505.04597.pdf), which is the reason the padding=0 instead of 1. Several other implementations follow this (TF#1, TF#2, note the "valid" padding). So the input size should be 572x572, and the output size should be 388x388.

So an easiest method would be resizing the input & output images to match respective sizes.

Using the padding wouldn't hurt since it nicely keeps the size, but it is not an exact architecture from the paper so use it as you own risk regarding to proper benchmarks.

A quick "fix" would be raising a readable error so as to match the I/O size, or giving an on/off switch for the padding.

irexyc commented 6 years ago

@L0SG Hi, thanks for your explanation.

I am confused about the input size and output size. According to the paper, it uses the overlap-tile strategy for segmentation of arbitrary large images. Does it mean that we shouldn't resize the label image but select part of the label image(388 x 388) and mirror the real image(388 x 388 -> 572 x 572) ?

I am new to segmentation. Does the effect of changing the label size to the final accuracy is little? By the way, when we do data augmentation, should we use different resize method to input image/label? (https://github.com/pytorch/vision/issues/9#issuecomment-294629198 said the input image uses bilinear while the label uses neirest-neighbour)

L0SG commented 6 years ago

@irexyc Yes you're right. For the net to utilize the "valid" padding strategy of convolutions, you may want to tile the (388x388) image to have a shape of 572x572 like Fig.2 from the paper (the word "resize" of my previous comment is kind of a misnomer here, and I use the model with the "tiled" CT scan images). This shows an example with mirror-padding. This may further clarify the I/O.

I think the "bilnear for input image & nearest-neighbor for binary segmentation mask" is a general practice since bilinear provides more natural & smooth interpolation for images and we want to keep the mask binary and not interpolating it.

lfdeep commented 5 years ago

setting padding to 1 instead of 0 worked for me.

Hello,i meet the same problems! How i set padding to 1?

shariq-ali commented 4 years ago

Can any one find the solution ? Please help me i'm new on machine learning and getting the same error.

alar0330 commented 4 years ago

TL;DR: Size inconsistency is NOT an issue of the U-Net implementation for the original version from the paper referenced above. The original paper used a mirror-tile strategy for input images to yield a desired output dimension.

Source: https://arxiv.org/pdf/1505.04597.pdf image

ckolluru commented 4 years ago

@lfdeep

Change padding in lines 174-183 in utils.py, unetConv2 function

if is_batchnorm:
            self.conv1 = nn.Sequential(
                nn.Conv2d(in_size, out_size, 3, 1, 1), nn.BatchNorm2d(out_size), nn.ReLU()
            )
            self.conv2 = nn.Sequential(
                nn.Conv2d(out_size, out_size, 3, 1, 1), nn.BatchNorm2d(out_size), nn.ReLU()
            )
        else:
            self.conv1 = nn.Sequential(nn.Conv2d(in_size, out_size, 3, 1, 1), nn.ReLU())
            self.conv2 = nn.Sequential(nn.Conv2d(out_size, out_size, 3, 1, 1), nn.ReLU())

Make sure to check with the summary function that this is what you want to do.