Closed kaixiao closed 2 years ago
@kaixiao it seems that in your example you have drop_last=True. We aim to maximize compatibility with the pytorch dataloader and it seems that we have exactly the same behavior
In [3]: ch.utils.data.TensorDataset(ch.Tensor([(0, 1), (1, 2)]))
Out[3]: <torch.utils.data.dataset.TensorDataset at 0x7f824f9a47c0>
In [4]: dataset = ch.utils.data.TensorDataset(ch.Tensor([(0, 1), (1, 2)]))
In [5]: loader = ch.utils.data.DataLoader(dataset, batch_size=3, drop_last=True)
In [6]: len(loader)
Out[6]: 0
Feel free to reopen if you see a discrepancy with pytorch's DataLoader!
Thanks for the clarification, @GuillaumeLeclerc! In that case, I'm wondering if the default behavior for ffcv loaders should be drop_last=False instead, since that seems to be the pytorch default. But it's helpful to know that this is an easy fix.
Default drop_last behavior in Pytorch: ` In [2]: dataset = ch.utils.data.TensorDataset(ch.Tensor([(0, 1), (1, 2)]))
In [3]: loader = ch.utils.data.DataLoader(dataset, batch_size=3)
In [4]: len(loader) Out[4]: 1 `
That's definitely an oversight from us. I'm not sure changing now as people might be already relying on the default value is the thing to do though.
Initializing the loader as follows:
results in a loader of length 0 when batch_size > len(indices).
I think a warning, an error, or returning a length 1 loader containing exactly len(indices) elements in the loader would all be reasonable behaviors that are more intuitive than returning a length 0 loader.