mlcommons / training

Reference implementations of MLPerf™ training benchmarks
https://mlcommons.org/en/groups/training
Apache License 2.0
1.58k stars 549 forks source link

[RNN-T] clarify gradient clipping status #461

Closed mwawrzos closed 3 years ago

mwawrzos commented 3 years ago

apex.optimizers.FuseLAMB includes gradient clipping with max gradient norm set to 1 (documentation, code).

The reference implementation contains the parameter clip_norm:https://github.com/mlcommons/training/blob/8f7f74f88874ae85a58ddedd778c320739b37444/rnn_speech_recognition/pytorch/train.py#L86-L87 This parameter relates to gradient clipping done out of the optimizer:https://github.com/mlcommons/training/blob/8f7f74f88874ae85a58ddedd778c320739b37444/rnn_speech_recognition/pytorch/train.py#L466-L470 This parameter is frozen to None (training_policies, compliance checker). Such constants may mislead submitters, suggesting that the reference doesn't clip gradients.

This PR is to avoid such confusion. Behavior stays unchanged. The change only exposes the default value from the optimizer. To minimize the impact of late change, submitters are allowed to use a parameter value either equal to 1 or equal to inf (see https://github.com/mlcommons/training_policies/pull/433).

github-actions[bot] commented 3 years ago

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

johntran-nv commented 3 years ago

Hi @petermattson , could someone from Google also approve? I want to make sure all submitters are good on these PRs since we don't have time to discuss in the SWG. I couldn't assign to Elias - he seems to be missing from the project.

petermattson commented 3 years ago

+Elias Mizan @.***> Could you please take a quick look at this? Github issue prevents me from adding you as a reviewer. Thanks much! :-)

On Fri, Apr 16, 2021 at 9:15 AM johntran-nv @.***> wrote:

Hi @petermattson https://github.com/petermattson , could someone from Google also approve? I want to make sure all submitters are good on these PRs since we don't have time to discuss in the SWG. I couldn't assign to Elias - he seems to be missing from the project.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mlcommons/training/pull/461#issuecomment-821286808, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIIVUHKHE7JQ5BSLTD5YEADTJBPD3ANCNFSM427Z6FOQ .

petermattson commented 3 years ago

I approved, but it says that you have to approve.

On Mon, Apr 19, 2021 at 5:20 PM Peter Mattson @.***> wrote:

+Elias Mizan @.***> Could you please take a quick look at this? Github issue prevents me from adding you as a reviewer. Thanks much! :-)

On Fri, Apr 16, 2021 at 9:15 AM johntran-nv @.***> wrote:

Hi @petermattson https://github.com/petermattson , could someone from Google also approve? I want to make sure all submitters are good on these PRs since we don't have time to discuss in the SWG. I couldn't assign to Elias - he seems to be missing from the project.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mlcommons/training/pull/461#issuecomment-821286808, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIIVUHKHE7JQ5BSLTD5YEADTJBPD3ANCNFSM427Z6FOQ .