mlcommons / training

Reference implementations of MLPerf™ training benchmarks
https://mlcommons.org/en/groups/training
Apache License 2.0
1.58k stars 549 forks source link

[RN50] What should Drop_train_remainder be set to? #489

Closed nv-rborkar closed 1 year ago

nv-rborkar commented 3 years ago

As per rules [here] (https://github.com/mlcommons/training_policies/blob/master/training_rules.adoc#95-equivalence-exceptions)), also quoting below:

If data set size is not evenly divisible by batch size, one of several techniques may be used. The last batch in an epoch may be composed of the remaining samples in the epoch, may be padded, or may be a mixed batch composed of samples from the end of one epoch and the start of the next. If the mixed batch technique is used, quality for the ending epoch must be evaluated after the mixed batch. If the padding technique is used, the first batch may be padded instead of the last batch.

Our interpretation of above was to never drop remainder images of the dataset. However, we would like to get confirmation if reference RCP runs & all other submissions also do the same i.e set drop_train_remainder flag to False. We can try to align on this for v1.1

It was observed in v1.0 review period via https://github.com/mlcommons/submission_training_1.0/issues/48

johntran-nv commented 1 year ago

@nv-rborkar is this still relevant? If so, please arb the right person.

nv-rborkar commented 1 year ago

Question for the reference owner. @sgpyc can you please answer this & then we can accordingly close the issue.

peladodigital commented 1 year ago

In an effort to clean up the git repo so we can maintain it better going forward, the MLPerf Training working group is closing out issues older than 2 years, since much has changed in the benchmark suite. If you think this issue is still relevant, please feel free to reopen. Even better, please come to the working group meeting to discuss your issue