mlcommons / training_policies

Issues related to MLPerf™ training policies, including rules and suggested changes
https://mlcommons.org/en/groups/training
Apache License 2.0
93 stars 66 forks source link

packing/padding rules updates #411

Closed johntran-nv closed 3 years ago

johntran-nv commented 3 years ago

Clarifying packing/padding rules for https://github.com/mlcommons/training_policies/issues/376.

johntran-nv commented 3 years ago

Perhaps we should discuss again next meeting, but my memory from the last discussion was that we wanted to make sure people did not use specific packs to improve their convergence. I specifically recall we discussed not wanting to allow arbitrary packing (ie pack all the size 1 tokens together).

mrinaliyer commented 3 years ago

Lets indeed discuss this again. IMHO, disallowing the most efficient packing should disallow bucketing elsewhere too as they have similar overall effect. However, as a mitigating effect, there is a requirement of running a minimum of 3M steps, so I would imagine that faster convergence from particular packs wouldnt matter much.

On Fri, Jan 8, 2021 at 4:00 PM johntran-nv notifications@github.com wrote:

Perhaps we should discuss again next meeting, but my memory from the last discussion was that we wanted to make sure people did not use specific packs to improve their convergence. I specifically recall we discussed not wanting to allow arbitrary packing (ie pack all the size 1 tokens together).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mlcommons/training_policies/pull/411#issuecomment-757054290, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABB363QDCRSBMG2FSVOF7N3SY6MABANCNFSM4VX4FWZQ .

-- Mrinal Iyer, Ph.D. AI Applications Specialist Graphcore

bitfort commented 3 years ago

I think "(d) hyperparameter borrowing is still possible, meaning that same set of hyperparameters should converge similarly with or without packing." is well intentioned and the right goal. I am not sure if this is actually true. I think an opinion from a submitter who submits packed language models is needed to validate this doesn't disqualify their previous work.

johntran-nv commented 3 years ago

This PR is no longer valid. We are following up in PR 418. Closing.