rishizek / tensorflow-deeplab-v3

DeepLabv3 built in TensorFlow
MIT License
286 stars 102 forks source link

[Q] Why is there "resize_bilinear" for "image_level_features"? (I think it does nothing..) #37

Open ywpkwon opened 4 years ago

ywpkwon commented 4 years ago

In the function "atrous_spatial_pyramid_pooling", (line 21, deeplab_model.py)

There is "image_level_features" (line 54--61)

        # (b) the image-level features
        with tf.variable_scope("image_level_features"):
          # global average pooling
          image_level_features = tf.reduce_mean(inputs, [1, 2], name='global_average_pooling', keepdims=True)
          # 1x1 convolution with 256 filters( and batch normalization)
          image_level_features = layers_lib.conv2d(image_level_features, depth, [1, 1], stride=1, scope='conv_1x1')
          # bilinearly upsample features
          image_level_features = tf.image.resize_bilinear(image_level_features, inputs_size, name='upsample')

I think "image_level_features" is same size as "inputs", since it is just a reduce_mean with keepdims. Also, input_size = tf.shape(inputs[1:3]).

=> Then they are the same size, and why one should do the tf.image.resize_bilinear(image_level_features, inputs_size)?

haydengunraj commented 4 years ago

As the comments explain the reduce_mean call performs global average pooling across dimensions 1 (height) and 2 (width). This results in a feature map with size Nx1x1xC, which is then passed to a 1x1 conv (no shape change). As such, the tf.image.resize_bilinear call is used to upsample the feature dimensions to match the input dimensions so that they can be concatenated.

ywpkwon commented 4 years ago

@haydengunraj , Thanks for explanation. I was mistakenly confused with the keepdims. So, if tf.image.resize_biliear converts A=Nx1x1xC to B=NxHxWxC, isn't the all HxW values are same with the 1x1 number (channel-wisely)? For example, A[n, 0, 0, c] == B[n, :, :, c] for any n and c?

Then, isn't it the same with the tf.tile in this case?