learnables / learn2learn

A PyTorch Library for Meta-learning Research
http://learn2learn.net
MIT License
2.61k stars 351 forks source link

learn2learn (l2l) data loader for Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples? #286

Open brando90 opened 2 years ago

brando90 commented 2 years ago

Hi,

I was wondering if there was a l2l dataloader for Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples?

references:

ideally with a l2l model example would be fantastic!

seba-1511 commented 2 years ago

Hi @brando90,

We already have a few of the datasets of Meta-Dataset in l2l.vision.datasets. The remaining ones are work-in-progress.

brando90 commented 2 years ago

Hi @brando90,

We already have a few of the datasets of Meta-Dataset in l2l.vision.datasets. The remaining ones are work-in-progress.

if there are issues for the remaining ones that I could help let me know. If it becomes a part of my critical path I'm happy to help.

brando90 commented 2 years ago

Hi @brando90,

We already have a few of the datasets of Meta-Dataset in l2l.vision.datasets. The remaining ones are work-in-progress.

Hi @seba-1511 , what would be the next steps to have this working for l2l?

seba-1511 commented 2 years ago

We're missing MS COCO and ILSVRC. For MS COCO we should provide a class to download the data (like the other datasets) but for ILSVRC it'd be enough to only have the splits.

brando90 commented 2 years ago

why is "for ILSVRC it'd be enough to only have the splits." enough but not getting the data?

Thanks for the quick response!

brando90 commented 2 years ago

perhaps this is a good place to start: https://github.com/mboudiaf/pytorch-meta-dataset

brando90 commented 2 years ago

@seba-1511 Hi Seba! trying to figure out how I'd implement a l2l BenchmarkTasksets for the distributed MAML example you gave us for meta-dataset (which I think would work for all setting that use episodic meta-learning).

Is all I need the following:

  1. implement a standard pytorch classification data set e.g.
    # Load task-specific data and transforms
    datasets, transforms = _TASKSETS[name](train_ways=train_ways,
                                           train_samples=train_samples,
                                           test_ways=test_ways,
                                           test_samples=test_samples,
                                           root=root,
                                           device=device,
                                           **kwargs)
    train_dataset, validation_dataset, test_dataset = datasets
    train_transforms, validation_transforms, test_transforms = transforms
  2. then pass that dataset object to TaskDataset
  3. Then return the BenchmarkTasksets as: return BenchmarkTasksets(train_tasks, validation_tasks, test_tasks)

So I only need to implement a normal pytorch data set for meta-dataset (in particular the getting a pair (x,y)) and your code takes care of the rest I think. Right?

code example of above: https://github.com/learnables/learn2learn/blob/36ac4fb8f91fec291becb4757720a82c72c550b8/learn2learn/vision/benchmarks/__init__.py#L54

PS: I think would be the same for the IBM data set, just need a data set obj.

seba-1511 commented 2 years ago

Hello @brando90,

Yes, I think this would do it. Note that to get comparable results with published numbers, you might have to implement varying shot numbers, as described in their paper. This can be done with TaskTransforms and should be pretty straight forward.

Good luck!

brando90 commented 2 years ago

related: https://github.com/learnables/learn2learn/issues/301 but talks about how t write a dataloader for SL using l2l using the data set object

brando90 commented 2 years ago

@seba-1511 hi seba! Sorry for the random ping. How do you suggest one goes around implementing meta-data set for l2l?

Would downloading the data and then following the way you sample data from the files directly like in mini-imagenet a good idea? Or do you have any suggestions?

AntreasAntoniou commented 1 year ago

I have actually implemented what I believe to be the MetaDataset episode sampling scheme in one of my current projects. It is mainly using l2l to get the datasets, and then using the episode sampler to create episodes.

It's a bit rough around the edges, but for the most part gets the job done.

If you give me until Friday I can come back with a PR for l2l to integrate that.