tristandeleu / pytorch-meta

A collection of extensions and data-loaders for few-shot learning & meta-learning in PyTorch
https://tristandeleu.github.io/pytorch-meta/
MIT License
1.98k stars 256 forks source link

Why provide a BatchMetaDataLoader if meta-sets have the same API as normal pytorch data-sets? #76

Closed renesax14 closed 4 years ago

renesax14 commented 4 years ago

I was reading the very helpful paper for the library and saw this paragraph that confused me wrt the implementation decisions/how the library works/usage:

2.4 Meta Data-loaders
The objects presented in Sections 2.1 & 2.2 can be iterated over to generate datasets from the meta- training set; these datasets are PyTorch Dataset objects, and as such can be included as part of any standard data pipeline (combined with DataLoader). Nonetheless, most meta-learning algorithms operate better on batches of tasks. Similar to how examples are batched together with DataLoader in PyTorch, Torchmeta exposes a MetaDalaoader that can produce batches of tasks when iterated over.

In particular it says that the meta-sets (wether of type/inherits CombinationMetaDataset or a MetaDataset) are just normal pytorch data-sets. If they have the same API as normal pytorch datasets, then why not just always pass them directly to the standard pytorch dataloaders? Why at all provide the interface:

dataloader = torchmeta.utils.data.BatchMetaDataLoader(dataset, batch_size=16)

I think a comment about this somewhere (probably in the paper would be good).

I've of course the paper (twice now) and I hope I didn't miss this detail if it was mentioned.

tristandeleu commented 4 years ago

BatchMetaDataLoader is just syntactic sugar for torch.utils.data.DataLoader, with a special collate function and sampler. The reason why you'd want to use BatchMetaDataLoader for Torchmeta's datasets over torch.utils.data.DataLoader is because the defaults for torch.utils.data.DataLoader were made specifically for standard supervised learning, and not episodes as is necessary in meta-learning: elements of Torchmeta's datasets are indexed with tuples of classes, as opposed to integers for standard PyTorch datasets.

To summarize, the datasets are indeed normal PyTorch datasets, as in they have the same API, but they are using a different indexing which requires different functions in the DataLoader. This could be made more explicit in the documentation.