Open experiencor opened 1 month ago
@experiencor Reason is that, with "choose' in the stream, StreamingDataset will do upsample/downsample per stream. The random seed depends on epochs. So when you compare 1st epoch with 2nd epoch, they will be different. The randomness comes from here
Although it is a bit counterintuitive since you have shuffle=False. We'll put in a fix for it hopefully soon.
OS: [macOS] mosaicml-streaming==0.7.6
To reproduce
Steps to reproduce the behavior:
from streaming import StreamingDataset, Stream
Expected behavior
The same list of examples for 2 iterations of dataset when shuffle = False.
Actual behavior
Different list of examples for 2 interations of dataset.