learnables / learn2learn

A PyTorch Library for Meta-learning Research
http://learn2learn.net
MIT License
2.61k stars 350 forks source link

Is there a way to support PyTorch iterable datasets? #366

Closed patricks-lab closed 1 year ago

patricks-lab commented 1 year ago

I was just wondering if it is possible for learn2learn to support PyTorch IterableDatasets as input (that is, PyTorch datasets that implement __iter()__ rather than __getitem()__ and __len()__)?

My current understanding is that l2l only accepts PyTorch map-style datasets (i.e. it implements __getitem()__ and __len()__), because when I try to pass in a IterableDataset iterable_dataset into, say, l2l.data.MetaDataset(iterable_dataset), it complains that the len() and getitem() functions are not implemented.

To give some more context, I have tried to implement __len()__ and __getitem()__ for a Pytorch IterableDataset version of Meta-Dataset (https://github.com/mboudiaf/pytorch-meta-dataset) I am currently trying to pass into learn2learn, but I'm not sure if it is possible because MetaDataset is a very large dataset (w/ ~4934 total classes and 220GB total) and thus indexing w/ len() and getitem() would be infeasible.

Thanks in advance for your guidance!

seba-1511 commented 1 year ago

Thanks for opening the issue @patricks-lab. We don't support Iterable datasets and I don't know of an easy way to do so.

For Meta-Dataset however, you could write a Map dataset, and a class that inherits from l2l's MetaDataset while exposing the same interface (class to indices and indices to classes). This worked for me when few-shot training on ImageNet.

patricks-lab commented 1 year ago

Do you happen to have an example code somewhere for your ImageNet implementation (or any pointers for) how you could write such a Map dataset with the ability to map classes to indices? Thanks a lot again!