pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
16.12k stars 6.94k forks source link

BUG: The torchvision.models.alexnet is incorrect #549

Open yaox12 opened 6 years ago

yaox12 commented 6 years ago

In self.classifier, the correct order of layers should be fc6-relu-dropout-fc7-relu-dropout-fc8, which means that dropout layer should be after the fc and relu layers.

This does not matter when inference. However, when you fine tune AlexNet model on other datasets, the incorrect layer sequence will result in an accuracy drop of 2 percents or more.

Can I make a PR to fix it? The pre-trained model URL need to update too.

fmassa commented 6 years ago

I had a quick look and it seems that you might be right. Following @soumith comment on https://github.com/pytorch/vision/pull/463 , I think we will be living with this mistake, as people have already built models and got results based on the models that we provided, and changing that now would not be great.

Thanks for catching this issue though!

yaox12 commented 6 years ago

Adjusting the order of layers won't break the pre-trained models much, we only need to change some symbol links, for example, move classifier.1.weights to classifier.0.weights. There is no need to re-train the model. We load the previous model, adjust the order of layers and save it to checkpoint file. The forward inference will not change. As PyTorch has not reach version 1.0, I think making some corrections is as important as keeping backwards compatibility. Though AlexNet is outdated today, researchs about Transfer Learning and Domain Adaptation still take it as a typical example. I hope we can fix it.

soumith commented 6 years ago

yea, this seems worth fixing, esp. if the forward inference doesn't change. The other issue was around changing the model from 256 to 384 channels on a particular layer, so it's quite a drastic change.

We're open to a PR.

mbsariyildiz commented 5 years ago

Hello,

I just realized that in the AlexNet paper the number of filters in the first conv layer is 96 (the last paragraph of page 4). But here in this repository, the module has 64 filters. I am wondering if it is implemented like this on purpose?

Thanks

fmassa commented 5 years ago

@mbsariyildiz the difference you are mentioning is due to the fact that in the original implementation is different depending on the number of GPUs used, see https://github.com/pytorch/vision/pull/463#issuecomment-379214941 for more details

vbogach commented 4 years ago

There are two more issues in this AlexNet implementation:

  1. LocalResponseNorm layers are missing, but present in the original paper.
  2. This implementation uses AdaptiveAvgPool2d before first fully connected layer. The paper states that max pooling should be used instead.
fmassa commented 4 years ago

@vbogach Hi,

About point 2, that's not exactly correct. We do use max_pool, see https://github.com/pytorch/vision/blob/5306635396d1ebd3047d56b16745350cf8c1e2c4/torchvision/models/alexnet.py#L31 but we added the avg_pool to make the model support images of different sizes. For images of size 224x224, the avg_pool is a no-op.

About point 1, that's correct, we do not have LRN. IIRC, this actually doesn't make much of a difference.

Given that this implementation has been around for a while, we will probably not be changing it.