prlz77 / ResNeXt.pytorch

Reproduces ResNet-V3 with pytorch
MIT License
505 stars 121 forks source link

question about the dimension of the net #21

Closed Agito555 closed 2 years ago

Agito555 commented 2 years ago

If the input of the net is of 643224224 dimension,where 64 is the batch size,3 is the channels and 224 is the size of the original image,and i run the code and find out that the output's dimension of the net is 480210,where 10 is the classes to predict. Is the output correct?Shouldn't the dimension of the output be 64*10? Maybe i get something wrong?

Agito555 commented 2 years ago

If the input of the net is of 64x3x224x224 dimension,where 64 is the batch size,3 is the channels and 224 is the size of the original image,and i run the code and find out that the output's dimension of the net is 4802x10,where 10 is the classes to predict. Is the output correct?Shouldn't the dimension of the output be 64x10? Maybe i get something wrong?

prlz77 commented 2 years ago

Hi @Agito555 the problem is that the network expects images of size 32, so inputting 224 results in bigger feature maps, which messes up with the average pooling https://github.com/prlz77/ResNeXt.pytorch/blob/master/models/model.py#L138

I suggest you just change the above line by x.mean((2,3))

Hope this helps!

Pau

Agito555 commented 2 years ago

Yeah,thanks for your reply. I take it for granted that the input size of the image is the same as the size in the paper. I also want to know whether the network architecture is a little bit different from the architecture(Resnext50) in the paper?

prlz77 commented 2 years ago

Well in the paper they also try CIFAR (Table 7) for which the input is 32x32. There should be no differences with respect to the CIFAR model. If there are, it could be due to many reasons such as differences in the frameworks used to train the models.