On lines 240, 245, and 252 of the ResNet50 implementation the default value of (2, 2) for stride is used. In the conv_block on lines 114 and 131 (for both the main path and the shortcut) a filter of size (1, 1) is used with the (2, 2) stride. However, wouldn't that ignore half of the values because the filter is smaller than the stride? I propose that the conv_block should use zero_padding of 1 and then a (3,3) filter instead. This way all the sizes are kept the same and no information is lost with either a stride of (1,1) or (2,2).
Many of deep learning architectures are evolving to avoid computational redundancy. A stride of 2 should not be viewed in terms of information loss. The 1x1 filter with the 2x2 stride is just one of the down-sampling operations. Please see the original papers.
Keras applications preserve the hyperparameters and the architectures proposed by the original authors.
On lines 240, 245, and 252 of the ResNet50 implementation the default value of (2, 2) for stride is used. In the conv_block on lines 114 and 131 (for both the main path and the shortcut) a filter of size (1, 1) is used with the (2, 2) stride. However, wouldn't that ignore half of the values because the filter is smaller than the stride? I propose that the conv_block should use zero_padding of 1 and then a (3,3) filter instead. This way all the sizes are kept the same and no information is lost with either a stride of (1,1) or (2,2).