prlz77 / ResNeXt.pytorch

Reproduces ResNet-V3 with pytorch
MIT License
509 stars 119 forks source link

Is the D right ? #14

Open pppLang opened 5 years ago

pppLang commented 5 years ago

https://github.com/prlz77/ResNeXt.pytorch/blob/48c19fba72a0d3971ba9edd6c4e61f860c3df519/models/model.py#L39

Hi, This may be a stupid question. I did not read the original paper, but i think the channels of the conv layer with stride 3 should be less than that with stride 1, to reduce the computational complexity.

I print the channels after line 39:

print(widen_factor, in_channels, D, out_channels)

and the output: 4 64 512 256 4 256 512 256 4 256 512 256 4 256 1024 512 4 512 1024 512 4 512 1024 512 4 512 2048 1024 4 1024 2048 1024 4 1024 2048 1024

Is that right? thanks for answer

prlz77 commented 5 years ago

Yes, it is right since these numbers are divided by the groups of the convolution.

pppLang commented 5 years ago

oh, really thanks for answer~

but, emmmm, i also refer to another implement: https://github.com/kuangliu/pytorch-cifar/blob/master/models/resnext.py

And the numbers of conv with stride 3 is less than that stride 1, even though it is also divided into some groups. I check the network architecture by print(net), maybe you want compare the two implement ?

thanks again~

prlz77 commented 5 years ago

Please make sure that you are executing with the correct commandline parameters. For --cardinality 32 --widen_factor 4 --depth 50 --base_width 4 I get:

  (stage_1): Sequential(
    (stage_1_bottleneck_0): ResNeXtBottleneck(
      (conv_reduce): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn_reduce): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv_conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)
      (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv_expand): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn_expand): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential(
        (shortcut_conv): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (shortcut_bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )

Which exactly corresponds to the paper (https://arxiv.org/pdf/1611.05431.pdf): image

pppLang commented 5 years ago

Yes, i think when --cardinality 32 --widen_factor 4 --depth 50 --base_width 4, it is right.

but in your train.py, --cardinality 8--widen_factor 4 --depth 29--base_width 64, i got :

(stage_1): Sequential( (stage_1_bottleneck_0): ResNeXtBottleneck( (conv_reduce): Conv2d(64, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn_reduce): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv_conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=8, bias=False) (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv_expand): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn_expand): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (shortcut): Sequential( (shortcut_conv): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (shortcut_bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) )

I think it is wrong, in a bottlenect, the channel number of the middle conv should be half of that of the output conv. But under this parameter setting, it is twice.

sorry, i don't know how to change lines, but you can test by yourself.

prlz77 commented 5 years ago

Resnext bottlenecks are a bit different, if you ask for a base width of 64 and a cardinality of 8, this is 64*8 = 512. These 512 will be divided in 8 groups of 64 channels.

Maybe I am wrong, could you execute the oiriginal torch code and compare with mine to make it sure?

pppLang commented 5 years ago

Running the torch code is a bit troublesome, but i think you are right.

I test a total of three implement code: yours: https://github.com/prlz77/ResNeXt.pytorch/blob/master/models/model.py https://github.com/kuangliu/pytorch-cifar/blob/master/models/resnext.py https://github.com/D-X-Y/ResNeXt-DenseNet/blob/master/models/resnext.py

And your network architecture is same as the third one, but different from the second.

After i finish my work, i will check again. Really thanks for your answer~