See here as well: https://pytorch.org/docs/stable/data.html
_In distributed mode, calling the setepoch() method at the beginning of each epoch before creating the DataLoader iterator is necessary to make shuffling work properly across multiple epochs. Otherwise, the same ordering will be always used.
I noticed that the data was not shuffled correctly while training. Seems that there should be
set_epoch
added when distributed training is used. See here https://github.com/pytorch/examples/blob/fe8abc3c810420df2856c6e668258f396b154cee/imagenet/main.py#L232.See here as well: https://pytorch.org/docs/stable/data.html _In distributed mode, calling the setepoch() method at the beginning of each epoch before creating the DataLoader iterator is necessary to make shuffling work properly across multiple epochs. Otherwise, the same ordering will be always used.