kuangliu / pytorch-cifar

95.47% on CIFAR10 with PyTorch
MIT License
5.92k stars 2.13k forks source link

Can't achieve the reported accuracy on MobileNetV2 #74

Open lollllcat opened 5 years ago

lollllcat commented 5 years ago

I tried to replicate the experiments on ResNet, VGG, MobileNet, and MobileNetV2.

For ResNet and VGG, actually, I can get better results than reported (around 2% higher).

However, for the MobileNetV2 I can only get to about 90.1%, which is much lower than 94.5% reported.

I wonder if there is something I missed during the training? Or should I apply different learning for MobileNet and V2

I am following everything from the README including the learning rate decay at epoch 150 and 250, optimizers and etc...

jmaronas commented 5 years ago

same to me, getting accuracy of 91%.

caiwenpu commented 5 years ago

It's same to me. I could only get ~91% accuracy for the MobileNetV2 in this code.

I think this is because the downsample layers are too many(three downsample layers), leading to only 4*4 images before the last avgpooling.

Here is the repository https://github.com/tinyalpha/mobileNet-v2_cifar10, which uses only two downsample layers and gets ~94.5% accuracy for the MobileNetV2.

lollllcat commented 5 years ago

It's same to me. I could only get ~91% accuracy for the MobileNetV2 in this code.

I think this is because the downsample layers are too many(three downsample layers), leading to only 4*4 images before the last avgpooling.

Here is the repository https://github.com/tinyalpha/mobileNet-v2_cifar10, which uses only two downsample layers and gets ~94.5% accuracy for the MobileNetV2.

you are right, the performance on ShuffleNet, ShuffleNetV2, and MobileNet is also not very high.

jmaronas commented 5 years ago

That is cool. Thanks for the information and the repo. Btw do you Know a shufflenet implementation with better accuracy in cifar10? If I am not incorrect I reached 90% in cifar10 and 70% in cifar100

El jue., 11 abr. 2019 6:59, lollllcat notifications@github.com escribió:

It's same to me. I could only get ~91% accuracy for the MobileNetV2 in this code.

I think this is because the downsample layers are too many(three downsample layers), leading to only 4*4 images before the last avgpooling.

Here is the repository https://github.com/tinyalpha/mobileNet-v2_cifar10, which uses only two downsample layers and gets ~94.5% accuracy for the MobileNetV2.

you are right, the performance on ShuffleNet, ShuffleNetV2, and MobileNet is also not very high.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kuangliu/pytorch-cifar/issues/74#issuecomment-481965635, or mute the thread https://github.com/notifications/unsubscribe-auth/AOinkmbIbyAbFFLebt-R-EKEHFOH-tV8ks5vfsE4gaJpZM4b8qCs .

lollllcat commented 5 years ago

That is cool. Thanks for the information and the repo. Btw do you Know a shufflenet implementation with better accuracy in cifar10? If I am not incorrect I reached 90% in cifar10 and 70% in cifar100 El jue., 11 abr. 2019 6:59, lollllcat notifications@github.com escribió: It's same to me. I could only get ~91% accuracy for the MobileNetV2 in this code. I think this is because the downsample layers are too many(three downsample layers), leading to only 4*4 images before the last avgpooling. Here is the repository https://github.com/tinyalpha/mobileNet-v2_cifar10, which uses only two downsample layers and gets ~94.5% accuracy for the MobileNetV2. you are right, the performance on ShuffleNet, ShuffleNetV2, and MobileNet is also not very high. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#74 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/AOinkmbIbyAbFFLebt-R-EKEHFOH-tV8ks5vfsE4gaJpZM4b8qCs .

I think running the code here you can get around 91% in g2 and 90.8% in g3. Otherwise, I am not sure.

ngcthuong commented 5 years ago

Same here, I only get MobileNet - 89.83%, MobileNetv2 - 92.17%, ShuffleNetG2 - 90.32 I use learning rate 0.1, 0.01 and 0.0001 for 0-25-50-100 (total 100 epochs)

thanhmvu commented 4 years ago

For those who missed it, there is a closed issue on hyperparams of MobileNetV2 at #29, in which @zhaohui-yang pointed out that setting weight decay to 4e-5. My results also confirm this. With everything else the same (sgd, epochs 350, batch size 128, lr 0.1, steps [150, 250]), I got best accuracy at:

pdejorge commented 3 years ago

I've not been able to reproduce the 94.43% accuracy reported in the repo. I changed the weight decay to 4e-5 as suggested and run the code several times with different random seeds but the most I got is 93.7%. Has anyone had the same issue? @thanhmvu did you only update the weight decay? Changing nothing else?

thanhmvu commented 3 years ago

@pdejorge In case it helps, I ran these on a single gpu. I believe all other hyperparams are the same, otherwise I think I would have mentioned those details given that I mentioned the epochs of my best results for specificity. I don't remember if/where I keep the code/log for these, so I'm sorry I can't be more helpful than that