zanilzanzan / FuseNet_PyTorch

Joint scene classification and semantic segmentation with FuseNet
GNU General Public License v3.0
108 stars 34 forks source link

RuntimeError: Need input.size[1] == 3 but got 1 instead. #1

Closed Jaiy closed 6 years ago

Jaiy commented 6 years ago

Hi, thank you for release the code firstly, but when I train on the NYU dataset and run the script Train_FuseNet.py, I meet the error "RuntimeError: Need input.size[1] == 3 but got 1 instead."as follow: default Could you give me some advices to solve the problem?Thanks a lot sincerely!

zanilzanzan commented 6 years ago

Hello there Jaiy, according to the error message, the first depth layer expects 3-channeled input, however it shouldn't. :D The weights of the first depth layer are averaged over the channel dimension (after initialization from VGG's first convolutional layer with three channels):

Initialize the model "feats" referencing the weights of VGG 16 feature detection component: feats = list(models.vgg16(pretrained=True).features.children())

Compute the average "avg" of feats over dimension 1, which is the channel dimension: avg = torch.mean(feats[0].cuda(gpu_device).weight.data, dim=1)

Create the first convolutional layer of the depth encoder and initialize the weights from avg: self.conv11d = nn.Conv2d(1, 64, kernel_size=3, padding=1).cuda(gpu_device) self.conv11d.weight.data = avg

I don't exactly know how, but it looks like there is a problem at this part. So could you please check out the dimensions of the first depth layer, adding the following after the line above:
print('conv11d shape: ', self.conv11d.weight.data.size())

Jaiy commented 6 years ago

Hi, thanks a lot for replying. I have added the print after the line, it shows "('conv11d shape: ', (64L, 3L, 3L))". But I'm still confusing about the problem, I simply download the processed .h5py dataset with 40 annotations and 10 classes you provided, and I feel puzzled to handle it.

zanilzanzan commented 6 years ago

Hey there, the shape should be of the size (64L, 1L, 3L, 3L). Conv2d takes four-dimensional input*, not three. I don't get why you get that shape.

Apart from that, the (depth and rgb) images are stored with the format (N, C, H, W) in the h5 file; N: number, C: channel, H: width, W: height. Segmentation labels are stored with the format (N, H, W) and the scene classes are stored as a one-dimensional array. To get to know the contents of the .h5py file, you have to use the "keys" method. This will give you a List of unicode strings of your dataset and group names*:

h5file = h5py.File('nyu_class_10_db.h5', 'r') names = h5file.keys() print(names)

Then you can run the necessary operations using these key names in the file, such as learning the size of the specific dataset or printing out specific images. Besides, the assignment of the datasets to numpy arrays are done in the /utils/data_utils_class.py.

zanilzanzan commented 6 years ago

Have you managed to make any progress, Jaiy?

Jaiy commented 6 years ago

Hi, zan. Unfortunately, I'm still confused on this issue, I feel very puzzled. I print the list of unicode strings of the dataset as you described, it can successfully print like: "[u'class_test', u'class_train', u'depth_test', u'depth_train', u'label_test', u'label_train', u'rgb_test', u'rgb_train']" Then I check the path and file, it was no problem, I get no idea about it. I had planned to solve this problem and then tell you, but it seems failed. I'll search and study the problem again.Thanks again for your kind help! You are one of the most kind and patient author I've ever seen. Best!

zanilzanzan commented 6 years ago

Hey Jaiy, thank you for your kind comment. I wish I could provide you with a deeper insight to solve the issue right away. Please do not hesitate to ask if you have more questions in the process or in general. Good luck!