hellochick / ICNet-tensorflow

TensorFlow-based implementation of "ICNet for Real-Time Semantic Segmentation on High-Resolution Images".
405 stars 153 forks source link

About the pad at the max_pool layer? #31

Closed zhuanjiao2222 closed 5 years ago

zhuanjiao2222 commented 6 years ago

The original ICNet has a pad at the max_pool layer, but this code does not have, why? @hellochick

hellochick commented 6 years ago

Hey @zhuanjiao2222 , since I replace the padding method from 'VALID' to 'SAME' in first three layers.

zhuanjiao2222 commented 6 years ago

Hi, @hellochick , I know why the accuracy of your code is lower than original ICNet. Your code is a little different from original ICNet, I changed your code based on ICNet: (1)The original ICNet is taken 10252049 as input. (2) Befor max_pool layer, I added a zero_padding layer: zero_padding(paddings=1, name='padding0') (3)The interp layer in caffe is different from yours, so I changed the interp layer of your code as follows: def interp(self, input, shrink_factor=1, zoom_factor=1, name=None): ori_h, ori_w = input.get_shape().as_list()[1:3] ori_h = (ori_h - 1) shrink_factor + 1 ori_w = (ori_w - 1) shrink_factor + 1 ori_h = ori_h + (ori_h - 1) (zoom_factor - 1) ori_w = ori_w + (ori_w - 1) * (zoom_factor - 1) resize_shape = [int(ori_h), int(ori_w)] return tf.image.resize_bilinear(input, size=resize_shape, align_corners=True, name=name) (3) I changed the kernel size and strides in avg_pool layers based on ICNet: shape = self.layers['conv5_3/relu'].get_shape().as_list()[1:3]

h, w = shape

    (self.feed('conv5_3/relu')
         .avg_pool(33, 65, 33, 65, name='conv5_3_pool1')
         .resize_bilinear(shape, name='conv5_3_pool1_interp'))
    (self.feed('conv5_3/relu')
         .avg_pool(17, 33, 16, 32, name='conv5_3_pool2')
         .resize_bilinear(shape, name='conv5_3_pool2_interp'))
    (self.feed('conv5_3/relu')
         .avg_pool(13, 25, 10, 20, name='conv5_3_pool3')
         .resize_bilinear(shape, name='conv5_3_pool3_interp'))
    (self.feed('conv5_3/relu')
         .avg_pool(8, 15, 5, 10, name='conv5_3_pool6')
         .resize_bilinear(shape, name='conv5_3_pool6_interp'))

Through the above changes, I got a 67.36% accuracy rate on train_30k model and 81.06% accuracy rate on train_90k model. But why it is higher than original ICNet when using train_90k model?

hellochick commented 6 years ago

Hey @zhuanjiao2222, you did an amazing job, really appreciate for your help. Can you make a pull request, so that I can merge your work ? For your question about train_90k model: Since this trainval_90k is trained on train + validation dataset, so the accuracy will be really high.

zhuanjiao2222 commented 6 years ago

Hi, @hellochick , I have made a pull request, but I only changed the ICNet model, the ICNet_bn model has not been changed.

hellochick commented 6 years ago

Hi @zhuanjiao2222, I have merged your work! Thanks. Btw, I'll change a few lines to support different size of input images!

zhuanjiao2222 commented 6 years ago

Hi, @hellochick , No Thanks!