hulianyuyy / CorrNet

Continuous Sign Language Recognition with Correlation Network (CVPR 2023)
84 stars 14 forks source link

2D-CNN or 3D-CNN? #15

Closed atonyo11 closed 7 months ago

atonyo11 commented 7 months ago

I apologize if I misunderstood. As shown in the picture, the feature extractor is depicted as a 2D CNN. However, in the code, a ResNet-based model is built using 3D CNN components, such as conv3x3, BatchNorm3d, etc. Could you please explain why the feature extractors are implemented using 2D CNN? Thank you in advance.

image

hulianyuyy commented 7 months ago

We always set the temporal kernel size as 1, and thus it's still a 2D CNN.

---Original--- From: @.> Date: Mon, Nov 13, 2023 21:14 PM To: @.>; Cc: @.***>; Subject: [hulianyuyy/CorrNet] 2D-CNN or 3D-CNN? (Issue #15)

Sorry if I miss understand. As show in the picture, feature extractor is 2D CNN, but in the code, resnet based is build from 3D CNN, eg. conv3x3, BatchNorm3d etc. So why it is feature extractor is 2D CNN? Thank you in advance.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

atonyo11 commented 7 months ago

temporal kernel size as 1

As I understand, when kernel_size=(1,7,7) in the line self.conv1 = nn.Conv3d(3, 64, kernel_size=(1,7,7), stride=(1,2,2), padding=(0,3,3), bias=False), it functions as a 2D CNN. Is it right?

hulianyuyy commented 7 months ago

yes, it's right.

---Original--- From: @.> Date: Tue, Nov 14, 2023 16:52 PM To: @.>; Cc: @.**@.>; Subject: Re: [hulianyuyy/CorrNet] 2D-CNN or 3D-CNN? (Issue #15)

temporal kernel size as 1

As I understand, when kernel_size=(1,7,7) in the line self.conv1 = nn.Conv3d(3, 64, kernel_size=(1,7,7), stride=(1,2,2), padding=(0,3,3), bias=False), it functions as a 2D CNN. Is it right?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

atonyo11 commented 7 months ago

I got it. Thank you very much!