Closed d-li14 closed 5 years ago
Yes of course. Training shufflenetv2 in pytorch is a little bit harder to reach the reported performance, but we are almost to get there. We will soon provide the model and training configs, thanks for waiting.
Totally agree and look forward to the new release.
Hi, we have uploaded the shufflenetv2_1x model with accuracy 69.6% which exactly matches the performance of original paper. The training scripts and cmds are also released. Pls see details in the repo. Thanks for waiting!
It seems that the smaller learning rate, batch size as well as longer training period make differences. I have ever tried using the large batch size and accordingly large learning rate as stated by the paper (cosine LR decay for 240 epochs), but got 0.6% lower accuracy.
Yes, that's it. Note that using the same setting to training MobileNetV2_1x, we can get 73.4% top-1 and 91.3% top-5, which is considerably higher than the pretrained models in your released repo (72.192 / 90.534 as in https://github.com/d-li14/mobilenetv2.pytorch).
Yes, I agree. Actually, I also noticed this phenomenon in the MobileNet training process, but the training time can be long, and my available computing resource is not affordable. Finally, I opt for a more efficient training configuration for all the MobileNets with different width multipliers and input resolutions as we can see.
@implus Hi, thanks for this awesome work! Could you please report some relevant performance of ShuffleNetV2 you got? And it will be highly appreciated if you could share your training configs.