Open John1231983 opened 6 years ago
I think the first conv should be conv2d. Am I right? The correct version likes
self.spatial_conv = nn.Conv2d(in_channels, intermed_channels, kernel=3,
stride=1, padding=1, bias=bias)
self.bn = nn.BatchNorm2d(intermed_channels)
self.relu = nn.ReLU()
self.temporal_conv = nn.Conv3d(intermed_channels, out_channels, temporal_kernel_size,
stride=temporal_stride, padding=temporal_padding, bias=bias)
I think it is okay. It should be kept as conv3d. but it actually performs like conv2d because one of kernel size is 1.
self.conv3 = SpatioTemporalResLayer(64, 128, 3, layer_sizes[1], block_type=block_type, downsample=True) why downsample=True?input size = 64 output size =128,I can't understand.can you help me ? Thanks! @irhum
My finding is that it's actually slower than C3D with fp16. With fp32, R2+1D is faster.
pytorch 1.3 cuda 10.2 cudnn 7.6.5
I think the newer cudnn is quite efficient in performing 3D convolution for fp16 inputs.
Great implementation. Could you provide the reproduce result that can use to compare with original implementation in CAFFE2? Thanks