naver / r2d2

Other
461 stars 86 forks source link

Is the model structure in the code the same as described in the paper? #31

Closed kkzh2313 closed 3 years ago

kkzh2313 commented 3 years ago

While debugging the code in train.py, I found the model structure of Quad_L2Net as follows:

ModuleList( (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=False, track_running_stats=True) (2): ReLU(inplace=True) (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=False, track_running_stats=True) (5): ReLU(inplace=True) (6): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (7): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=False, track_running_stats=True) (8): ReLU(inplace=True) (9): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2)) (10): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=False, track_running_stats=True) (11): ReLU(inplace=True) (12): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2)) (13): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=True) (14): ReLU(inplace=True) (15): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(4, 4), dilation=(4, 4)) (16): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=True) (17): ReLU(inplace=True) (18): Conv2d(128, 128, kernel_size=(2, 2), stride=(1, 1), padding=(2, 2), dilation=(4, 4)) (19): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=True) (20): Conv2d(128, 128, kernel_size=(2, 2), stride=(1, 1), padding=(4, 4), dilation=(8, 8)) (21): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=True) (22): Conv2d(128, 128, kernel_size=(2, 2), stride=(1, 1), padding=(8, 8), dilation=(16, 16)) )

Is there dilated convolution only at the green arrow in Figure 2 of the paper? In the model above, the convolutional layers behind the model all contain hole convolutions, and the value of the expansion is still doubling. 1)Is the structure of the model I debugged correct? 2)How should I understand Figure 2 in the paper?

UncleChuanchuan commented 3 years ago

I got the same structure as yours from the code. Maybe figure 2 has some mistakes.

jerome-revaud commented 3 years ago

Hmm yes you're right. For some reason, figure 2 in the paper does not match the actual architecture in term of dilation parameters.

The structure provided in our git is correct, the paper is erroneous,.