Add a rule about DLRM training data shuffling

johntran-nv commented 3 years ago

Shuffling rules about DLRM were not clear enough in the v0.7 round and they left a lot of room for interpretation. This update makes a clear rule that is easy to follow and should not impact convergence or performance of DLRM implementations.

This was actually part of https://github.com/mlcommons/training_policies/pull/411, which we discussed, but I mistakenly closed that thinking it was only about packing, which we no longer are using that PR for. This is cleaner to break out data shuffling into its own PR, anyway.

github-actions[bot] commented 3 years ago

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

johntran-nv commented 3 years ago

+emizan@google.com, +deepak.r.canchi@intel.com, could you please review/approve?

johntran-nv commented 3 years ago

Deepak suggested that it is too late for v1.0 to change this, which is fair. Let's defer discussion to v1.1.

Separately, it looks like I inadvertently merged this, maybe as part of another PR. I'll go fix that now as well.

mlcommons / training_policies

Add a rule about DLRM training data shuffling #441