About newly added ShuffleNetV2 models

implus / PytorchInsight

a pytorch lib with state-of-the-art architectures, pretrained models and real-time updated results

858 stars 123 forks source link

About newly added ShuffleNetV2 models #10

Closed d-li14 closed 5 years ago

d-li14 commented 5 years ago

@implus Hi, thanks for this awesome work! Could you please report some relevant performance of ShuffleNetV2 you got? And it will be highly appreciated if you could share your training configs.

implus commented 5 years ago

Yes of course. Training shufflenetv2 in pytorch is a little bit harder to reach the reported performance, but we are almost to get there. We will soon provide the model and training configs, thanks for waiting.

d-li14 commented 5 years ago

Totally agree and look forward to the new release.

implus commented 5 years ago

Hi, we have uploaded the shufflenetv2_1x model with accuracy 69.6% which exactly matches the performance of original paper. The training scripts and cmds are also released. Pls see details in the repo. Thanks for waiting!

d-li14 commented 5 years ago

It seems that the smaller learning rate, batch size as well as longer training period make differences. I have ever tried using the large batch size and accordingly large learning rate as stated by the paper (cosine LR decay for 240 epochs), but got 0.6% lower accuracy.

implus commented 5 years ago

Yes, that's it. Note that using the same setting to training MobileNetV2_1x, we can get 73.4% top-1 and 91.3% top-5, which is considerably higher than the pretrained models in your released repo (72.192 / 90.534 as in https://github.com/d-li14/mobilenetv2.pytorch).

d-li14 commented 5 years ago

Yes, I agree. Actually, I also noticed this phenomenon in the MobileNet training process, but the training time can be long, and my available computing resource is not affordable. Finally, I opt for a more efficient training configuration for all the MobileNets with different width multipliers and input resolutions as we can see.