hulianyuyy / CorrNet

Continuous Sign Language Recognition with Correlation Network (CVPR 2023)
84 stars 14 forks source link

Question About Identification Module #44

Closed 1695679265 closed 3 months ago

1695679265 commented 3 months ago

Firstly, thank you for your commitment to open source. I noticed in your paper that the temporal dilation rate for the dilated convolution in your Identification Module is 4, but it seems to be set to 1 in your code. Could it be that I misunderstood something? In my understanding, the part that defines the Identification Module is in lines 25-27 of modules/resnet: self.spatial_aggregation1 = nn.Conv3d(reduction_channel, reduction_channel, kernel_size=(9,3,3), padding=(4,1,1), groups=reduction_channel) self.spatial_aggregation2 = nn.Conv3d(reduction_channel, reduction_channel, kernel_size=(9,3,3), padding=(4,2,2), dilation=(1,2,2), groups=reduction_channel) self.spatial_aggregation3 = nn.Conv3d(reduction_channel, reduction_channel, kernel_size=(9,3,3), padding=(4,3,3), dilation=(1,3,3), groups=reduction_channel)

hulianyuyy commented 3 months ago

The temporal dilation of 4 equals using a temporal convolution with size of 9. We later found that using a temporal convolution with size of 9 achieve comparable accuracy with our original design with a bit few latency. Thus we change it into the current deisgn. You may still using the spatially and temporally disentangled architecture.

1695679265 commented 3 months ago

The temporal dilation of 4 equals using a temporal convolution with size of 9. We later found that using a temporal convolution with size of 9 achieve comparable accuracy with our original design with a bit few latency. Thus we change it into the current deisgn. You may still using the spatially and temporally disentangled architecture.

Oh, oh, thank you.