prlz77 / ResNeXt.pytorch

Reproduces ResNet-V3 with pytorch
MIT License
505 stars 121 forks source link

Question about the number of channels #4

Closed zl1994 closed 7 years ago

zl1994 commented 7 years ago

Hi, May I ask you a question? Why are the output channels of conv_reduce four times the number of input channels and how it can play the role of reducing dimensions before 3*3 convolution?

CifarResNeXt ( (conv_1_3x3): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn_1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True) (stage_1): Sequential ( (stage_1_bottleneck_0): ResNeXtBottleneck ( (conv_reduce): Conv2d(64, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn_reduce): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True) (conv_conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=8, bias=False) (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True) (conv_expand): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn_expand): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True) (shortcut): Sequential ( (shortcut_conv): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (shortcut_bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True) ) )

prlz77 commented 7 years ago

There are two factors here: the cardinality and the widen factor. The conv_reduce eliminates ("reduces") the widen factor but the same conv_reduce layer is used to expand the cardinality (number of groups). Since here cardinality > widen factor, it the result is an expansion. I know the "reduce" term might be misleading but I use it to keep the same notation as the plain resnet.

I hope this answers your question!

zl1994 commented 7 years ago

@prlz77 Thanks for responsing me patiently. But I wonder why you made such modifications since it decrease the efficiency of the model which is different from the origin intention of ResNeXt.

prlz77 commented 7 years ago

No problem! I did not make any modification, the code should be as the original! The original behaviour should be that if you use cardinality=8, the 1x1 convolution should output 512 feature maps (8*64) for the first block. I don't see difference with respect to the original implementation in luatorch:

        input
          |`-> (1): nn.Sequential {
          |      [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> output]
          |      (1): cudnn.SpatialConvolution(64 -> 512, 1x1) without bias
          |      (2): nn.SpatialBatchNormalization (4D) (512)
          |      (3): cudnn.ReLU
          |      (4): cudnn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1) without bias
          |      (5): nn.SpatialBatchNormalization (4D) (512)
          |      (6): cudnn.ReLU
          |      (7): cudnn.SpatialConvolution(512 -> 256, 1x1) without bias
          |      (8): nn.SpatialBatchNormalization (4D) (256)
          |    }

If you still think there is something different, please, tell me the parameters you are using (cardinality, width, etc.) to see if there is an error in the code.

zl1994 commented 7 years ago

@prlz77 Yes, you are right! I have figured it out. Thank you for your help and I'll close the issue.