Closed 1695679265 closed 3 months ago
The temporal dilation of 4 equals using a temporal convolution with size of 9. We later found that using a temporal convolution with size of 9 achieve comparable accuracy with our original design with a bit few latency. Thus we change it into the current deisgn. You may still using the spatially and temporally disentangled architecture.
The temporal dilation of 4 equals using a temporal convolution with size of 9. We later found that using a temporal convolution with size of 9 achieve comparable accuracy with our original design with a bit few latency. Thus we change it into the current deisgn. You may still using the spatially and temporally disentangled architecture.
Oh, oh, thank you.
Firstly, thank you for your commitment to open source. I noticed in your paper that the temporal dilation rate for the dilated convolution in your Identification Module is 4, but it seems to be set to 1 in your code. Could it be that I misunderstood something? In my understanding, the part that defines the Identification Module is in lines 25-27 of modules/resnet:
self.spatial_aggregation1 = nn.Conv3d(reduction_channel, reduction_channel, kernel_size=(9,3,3), padding=(4,1,1), groups=reduction_channel)
self.spatial_aggregation2 = nn.Conv3d(reduction_channel, reduction_channel, kernel_size=(9,3,3), padding=(4,2,2), dilation=(1,2,2), groups=reduction_channel)
self.spatial_aggregation3 = nn.Conv3d(reduction_channel, reduction_channel, kernel_size=(9,3,3), padding=(4,3,3), dilation=(1,3,3), groups=reduction_channel)