keras-team / keras-applications

Reference implementations of popular deep learning models.
Other
2k stars 910 forks source link

ResNet50 stride is greater than filter size #81

Closed jg4ye closed 5 years ago

jg4ye commented 5 years ago

On lines 240, 245, and 252 of the ResNet50 implementation the default value of (2, 2) for stride is used. In the conv_block on lines 114 and 131 (for both the main path and the shortcut) a filter of size (1, 1) is used with the (2, 2) stride. However, wouldn't that ignore half of the values because the filter is smaller than the stride? I propose that the conv_block should use zero_padding of 1 and then a (3,3) filter instead. This way all the sizes are kept the same and no information is lost with either a stride of (1,1) or (2,2).

taehoonlee commented 5 years ago

@jg4ye,

  1. Many of deep learning architectures are evolving to avoid computational redundancy. A stride of 2 should not be viewed in terms of information loss. The 1x1 filter with the 2x2 stride is just one of the down-sampling operations. Please see the original papers.
  2. Keras applications preserve the hyperparameters and the architectures proposed by the original authors.