Open LeoniekevandenBulk opened 6 years ago
Hi, thanks for your reply.
I understand that a SpatioTemporalConv is needed for the R(2+1)D network, but I don't think the original authors use it in their downsample step specifically, as can be found here. Your downsample step however, does use a SpatioTemporalConv. Could you explain why?
in your R(2+1)D network code:
self.conv3 = SpatioTemporalResLayer(64, 128, 3, layer_sizes[1], block_type=block_type, downsample=True)
downsample (bool, optional): If True
, the first block in layer will implement downsampling. Default: False
output size = 128 input size = 64 ,why downsample=True?
Thanks! @jfzhang95
I was checking your Pytorch implementation of the R2Plus1D model against the implementation in Caffe2 in the repository of the original paper (https://github.com/facebookresearch/VMZ), and I was wondering why you chose to implement the downsample step as a SpatioTemporalConv layer, while in the original implementation they seem to use only one Conv3D layer. They have coded it as follows:
if (num_filters != input_filters) or down_sampling: shortcut_blob = self.model.ConvNd( shortcut_blob, 'shortcutprojection%d' % self.comp_count, input_filters, num_filters, [1, 1, 1], weight_init=("MSRAFill", {}), strides=use_striding, no_bias=self.no_bias, ) if spatial_batch_norm: shortcut_blob = self.model.SpatialBN( shortcut_blob, 'shortcutprojection%d_spatbn' % self.comp_count, num_filters, epsilon=1e-3, is_test=self.is_test, )
Was this design choice on purpose, and if so, could you perhaps tell me why?
Thanks!