mlcommons / training_policies

Issues related to MLPerf™ training policies, including rules and suggested changes
https://mlcommons.org/en/groups/training
Apache License 2.0
92 stars 66 forks source link

[unet3D]: Rounding up epochs with mixed-batches and learning_rate schedules #427

Open mndevec opened 3 years ago

mndevec commented 3 years ago

There are 168 images in Unet3D dataset, which is lower than the other benchmarks. Based on the rules here, if we were to use batch size of 128, we can use mixed-batch approach and merge the images from 2 epochs in a single step.

This makes it a bit complicated to satisfy math equivalence regarding learning_rate_schedules. The closest math equivalent way would be scaling step numbers by (256 / 168) to match reference_epoch number. This still has differing learning_rate applications to partial epochs as in other models, but the difference might be more visible due to the smaller size of dataset.

So, I wanted to be sure about that this still satisfies math equivalence under the current rules, is that right?

mndevec commented 3 years ago

@johntran-nv @sergey-serebryakov Would that be possible to include this in the next working group meeting?