We randomly resample the shards (with replacement) and sample examples in buffer for training every time we resume/start the training run. This means our data loading is not determinitsic. We also don't do epoch based training but just using this for book keeping and being able to reuse the same training loop with other datasets/loaders.
Optionally make this more deterministic for reproducibility
We randomly resample the shards (with replacement) and sample examples in buffer for training every time we resume/start the training run. This means our data loading is not determinitsic. We also don't do epoch based training but just using this for book keeping and being able to reuse the same training loop with other datasets/loaders.
Optionally make this more deterministic for reproducibility