mlcommons / training_policies

Issues related to MLPerf™ training policies, including rules and suggested changes
https://mlcommons.org/en/groups/training
Apache License 2.0
92 stars 66 forks source link

Allowed optimizers for Image Classification using Pytorch #516

Open mrmhodak opened 1 year ago

mrmhodak commented 1 year ago

Currently, Image Classification/LARS/Pytorch combination lists no compliant optimizers.

However, 2.2 submission by Habana used what looks like FusedLars optimizer (their own implementation) for Pytorch on ResNet.

Does that mean that built-in FusedLars is MLPerf compliant for Image Classification?

mrmhodak commented 1 year ago

Actually, the optimizers that we want to use are FusedLAMB, FusedSGD imported from apex.optimizers:

from apex.optimizers import FusedLAMB, FusedSGD

Would these be MLPerf-compliant?

nv-rborkar commented 1 year ago

@sgpyc to keep me honest. LAMB is not an allowed optimizer for RN50. Only LARS and SGD are allowed. https://github.com/mlcommons/training/tree/master/image_classification#optimizer

The rules already call allow apex.optimizers.FusedSGD here

mrmhodak commented 1 year ago

3/23 update: The team is working on our own Fused Lars implementation for Pytorch. I should have a code to share for next week's meeting

mrmhodak commented 1 year ago

Here is the Fused Lars code: https://github.com/ROCmSoftwarePlatform/apex/blob/master/apex/optimizers/fused_lars.py

Please review that it is good to use

nv-rborkar commented 1 year ago

Thanks, implementation looks good as long as nesterov momentum is not used (reference doesn't use nesterov)