Closed rnaidu02 closed 4 years ago
There was some discussion on this during the meeting to try to understand the issue. We can follow up off line based on what was discussed.
Is there still an issue here? Please reach out over email.
AI(Christine) Follow up over email.
we are OK to close.
MXNet Resnet50 Lars with 4K BS cannot converge within 60 epochs, following are the details:
Intel team used MXNet build in mxnet.optimizer.lars and mxnet.lr_scheduler.PolyScheduler, as well as resnet50_v1b model from MXNet GluonCV model zoo. In terms of Hyper Parameters, we used the HPs from Google’s last submission with 4K BS (128 x 32 tpu): https://github.com/mlperf/training_results_v0.6/blob/master/Google/results/tpu-v3-32/resnet/result_0.txt
We want to know if we are missing something to reproduce the same convergence results as NV and Google when using the same LARS hyperparameter configurations for the ResNet-50 model from MXNet GluonCV model zoo.