irhum / R2Plus1D-PyTorch

PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal Convolutions for Action Recognition"
MIT License
363 stars 62 forks source link

What is performance in comparison with original implementation? #2

Open John1231983 opened 6 years ago

John1231983 commented 6 years ago

Great implementation. Could you provide the reproduce result that can use to compare with original implementation in CAFFE2? Thanks

John1231983 commented 6 years ago

I think the first conv should be conv2d. Am I right? The correct version likes

       self.spatial_conv = nn.Conv2d(in_channels, intermed_channels, kernel=3,
                                    stride=1, padding=1, bias=bias)
        self.bn = nn.BatchNorm2d(intermed_channels)
        self.relu = nn.ReLU()
        self.temporal_conv = nn.Conv3d(intermed_channels, out_channels, temporal_kernel_size, 
                                    stride=temporal_stride, padding=temporal_padding, bias=bias)
yechanp commented 5 years ago

I think it is okay. It should be kept as conv3d. but it actually performs like conv2d because one of kernel size is 1.

JinXiaozhao commented 4 years ago

self.conv3 = SpatioTemporalResLayer(64, 128, 3, layer_sizes[1], block_type=block_type, downsample=True) why downsample=True?input size = 64 output size =128,I can't understand.can you help me ? Thanks! @irhum

Litou1 commented 4 years ago

My finding is that it's actually slower than C3D with fp16. With fp32, R2+1D is faster.

pytorch 1.3 cuda 10.2 cudnn 7.6.5

I think the newer cudnn is quite efficient in performing 3D convolution for fp16 inputs.