mlcommons / training

Reference implementations of MLPerf™ training benchmarks
https://mlcommons.org/en/groups/training
Apache License 2.0
1.57k stars 548 forks source link

[Unet3d] - Add infinite data loader to align epochs->samples transition #697

Open mmarcinkiewicz opened 5 months ago

mmarcinkiewicz commented 5 months ago

The main change is to replace standard dataloaders (finishing at each epoch) to "infinite" ones - uniformly sampling from the dataset till the program terminates. This technically might change the order of samples, but:

  1. Everyone is using fairly large batch size (i.e. 56, which is 1/3 of the dataset)
  2. I checked the distribution of samples and throughout the training it is indistinguishable from the previous behavior
  3. The new RCPs are very similar to the old ones - I'll open a PR soon. Indeed some RCPs are a bit faster than previously, not sure if it's related to the new behavior, a bug in the old behavior when we switched to samples, or just due to the variance

There is no need to modify the submission code.

github-actions[bot] commented 5 months ago

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅