pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
16.18k stars 6.95k forks source link

New classification datasets support for FLAVA #5108

Closed NicolasHug closed 2 years ago

NicolasHug commented 2 years ago

To support our colleagues' work on the FLAVA paper, and to foster collaborations in the multi-modal space, we would like to implement a few new datasets. Almost all of them are classification datasets but some also support other tasks like segmentation.

CC-ing @pmeier and @jdsgomes as previously discussed. We're on a fairly short timeline for this work, and ideally we would get all these in by end of January 2022. I'm also wondering whether this is something that our open source contributors @oke-aditya @frgfm @zhiqwang could be interested in 🚀 ?

Implementing a new dataset

Implementing a dataset consists of 2 main things:

If there's some ambiguity in the choices to make, the reference to follow is the VISSL where most of these datasets are already supported.

For contritbutors

If you're interesting in taking one of the datasets above, please comment below with "I'm working on dataset X" so that others don't pick the same! :)

cc @pmeier

pmeier commented 2 years ago

I'm going to take DTD and Oxford Pets.

FER2013 This is a Kaggle dataset, so I'm not sure we'll be able to support download (but maybe)

Nope. Downloads from Kaggle are currently not supported, since they require login. For now I would simply not add a download flag. Later in the new style datasets, we can provide them as

https://github.com/pytorch/vision/blob/eac3dc7bab436725b0ba65e556d3a6ffd43c24e1/torchvision/prototype/datasets/utils/_resource.py#L168

abhi-glitchhg commented 2 years ago

Can I try the Stanford cars dataset?

jdsgomes commented 2 years ago

I am taking the Food 101 now now.

fibbonnaci commented 2 years ago

I was planning on taking the Stanford Cars dataset. @abhi-glitchhg if you're taking it, then I'll try the Food101 dataset

fibbonnaci commented 2 years ago

I am taking the Food 101 now now.

Dang, I'm a few seconds late. I'll try PCAM then.

zhiqwang commented 2 years ago

I was planning on taking the Flowers-102.

oke-aditya commented 2 years ago

I am planning to take SUN dataset. I'm unsure of my time and bandwidth as I would be working in office from next month. Any contributor can supersede me :smile:

sumukhaithal6 commented 2 years ago

I am planning to work on the GSTRB dataset.

frgfm commented 2 years ago

Coming late to the party, but I'd be keen to take care of EuroSAT :+1: Glad to hear the dataset zoo is extending :smile:

pmeier commented 2 years ago

@ everyone who volunteered to take a dataset: thanks a lot! @NicolasHug will be out until next year, so feel free to ping me on PRs.

yiwen-song commented 2 years ago

I'll take FGVC-Aircraft :)

puhuk commented 2 years ago

I'll take Country211 :)

saswatpp commented 2 years ago

@oke-aditya Mind if I take the SUN dataset task, please ?

oke-aditya commented 2 years ago

Sure. Go ahead

pmeier commented 2 years ago

I would be grateful, if someone is also up to adding their dataset also for the upcoming new style of the datasets. I've just added #5133 that details how this should be done. So far no one besides the core team has worked on that so we are actively looking for feedback on the contributor experience.

frgfm commented 2 years ago

I would be grateful, if someone is also up to adding their dataset also for the upcoming new style of the datasets. I've just added #5133 that details how this should be done. So far no one besides the core team has worked on that so we are actively looking for feedback on the contributor experience.

Oh nice, I read about those prototypes and was curious to play around with it :grin: Just to make sure I understand this: do you mean adding a second implementation of one of those datasets using the prototypes? or do you mean changing already implemented ones to use the prototypes?

pmeier commented 2 years ago

Just to make sure I understand this: do you mean adding a second implementation of one of those datasets using the prototypes?

Exactly. Let me know if you hit any roadblocks as I'm eager to get feedback.

jdsgomes commented 2 years ago

Hello @zhiqwang 👋 Are you still planning to work on the Flowers-102? If you are no longer interested or don't have time thats obviously ok, but we can put it up for grabs since we are aiming to finish this month.

pmeier commented 2 years ago

Same for @fibbonnaci and the PCAM dataset.

zhiqwang commented 2 years ago

Hi @jdsgomes , I'm working on this now, and hope to submit the PR today.

NicolasHug commented 2 years ago

Thanks a lot of offering to help with the prototypes @frgfm . Let me know which one(s) you're trying to implement so we don't overlap :) . On my side I'll give try to GTSRB.

pmeier commented 2 years ago

Hey @fibbonnaci, PCAM is the last dataset that does not have a PR up yet. Are you working on that? If yes please push a PR even if you are not done, so we can help out and accelerate this. Otherwise, I'll send one myself.

NicolasHug commented 2 years ago

As discussed with @fibbonnaci offline, I'll take over the PCAM dataset.

NicolasHug commented 2 years ago

Looks like we're all done

Thank you so much everyone who submitted a dataset, your help is much appreciated!

Tons of thanks to @pmeier in particular for all your help with submissions and the reviews!!