yfeng95 / PRNet

Joint 3D Face Reconstruction and Dense Alignment with Position Map Regression Network (ECCV 2018)
http://openaccess.thecvf.com/content_ECCV_2018/papers/Yao_Feng_Joint_3D_Face_ECCV_2018_paper.pdf
MIT License
4.96k stars 944 forks source link

How does the input dimension change in resBlock as padding is always "SAME"? #62

Closed meetdave06 closed 6 years ago

meetdave06 commented 6 years ago
` 
def resBlock(x, num_outputs, kernel_size = 4, stride=1, activation_fn=tf.nn.relu, 
normalizer_fn=tcl.batch_norm, scope=None):

    assert num_outputs%2==0 #num_outputs must be divided by channel_factor(2 here)
    with tf.variable_scope(scope, 'resBlock'):
        shortcut = x
        if stride != 1 or x.get_shape()[3] != num_outputs:
            shortcut = tcl.conv2d(shortcut, num_outputs, kernel_size=1, stride=stride, 
                        activation_fn=None, normalizer_fn=None, scope='shortcut')
        x = tcl.conv2d(x, num_outputs/2, kernel_size=1, stride=1, padding='SAME')
        x = tcl.conv2d(x, num_outputs/2, kernel_size=kernel_size, stride=stride, padding='SAME')
        x = tcl.conv2d(x, num_outputs, kernel_size=1, stride=1, activation_fn=None, padding='SAME', normalizer_fn=None)
        x += shortcut       
        x = normalizer_fn(x)
        x = activation_fn(x)
    return x

In the resBlock above, all the three convolution layers have padding as SAME. Then how does the input image height and width decrease during the encoding part?

wungemach commented 6 years ago

@meetdave06 For stride one, "SAME" padding means that the output spacial dimensions equal the input spacial dimensions. When you have "SAME" padding and a stride greater than one, the padding added is the minimum amount so that your final filter fits and doesn't overhang. It's easy to see that in this case, that's a padding of 1 on either side, and cuts the spacial dimensions in half each time it happens.