Closed NicolasHug closed 2 years ago
I'm going to take DTD and Oxford Pets.
FER2013 This is a Kaggle dataset, so I'm not sure we'll be able to support download (but maybe)
Nope. Downloads from Kaggle are currently not supported, since they require login. For now I would simply not add a download
flag. Later in the new style datasets, we can provide them as
Can I try the Stanford cars dataset?
I am taking the Food 101 now now.
I was planning on taking the Stanford Cars dataset. @abhi-glitchhg if you're taking it, then I'll try the Food101 dataset
I am taking the Food 101 now now.
Dang, I'm a few seconds late. I'll try PCAM then.
I was planning on taking the Flowers-102.
I am planning to take SUN dataset. I'm unsure of my time and bandwidth as I would be working in office from next month. Any contributor can supersede me :smile:
I am planning to work on the GSTRB dataset.
Coming late to the party, but I'd be keen to take care of EuroSAT :+1: Glad to hear the dataset zoo is extending :smile:
@ everyone who volunteered to take a dataset: thanks a lot! @NicolasHug will be out until next year, so feel free to ping me on PRs.
I'll take FGVC-Aircraft :)
I'll take Country211 :)
@oke-aditya Mind if I take the SUN dataset task, please ?
Sure. Go ahead
I would be grateful, if someone is also up to adding their dataset also for the upcoming new style of the datasets. I've just added #5133 that details how this should be done. So far no one besides the core team has worked on that so we are actively looking for feedback on the contributor experience.
I would be grateful, if someone is also up to adding their dataset also for the upcoming new style of the datasets. I've just added #5133 that details how this should be done. So far no one besides the core team has worked on that so we are actively looking for feedback on the contributor experience.
Oh nice, I read about those prototypes and was curious to play around with it :grin: Just to make sure I understand this: do you mean adding a second implementation of one of those datasets using the prototypes? or do you mean changing already implemented ones to use the prototypes?
Just to make sure I understand this: do you mean adding a second implementation of one of those datasets using the prototypes?
Exactly. Let me know if you hit any roadblocks as I'm eager to get feedback.
Hello @zhiqwang 👋 Are you still planning to work on the Flowers-102? If you are no longer interested or don't have time thats obviously ok, but we can put it up for grabs since we are aiming to finish this month.
Same for @fibbonnaci and the PCAM dataset.
Hi @jdsgomes , I'm working on this now, and hope to submit the PR today.
Thanks a lot of offering to help with the prototypes @frgfm . Let me know which one(s) you're trying to implement so we don't overlap :) . On my side I'll give try to GTSRB.
Hey @fibbonnaci, PCAM is the last dataset that does not have a PR up yet. Are you working on that? If yes please push a PR even if you are not done, so we can help out and accelerate this. Otherwise, I'll send one myself.
As discussed with @fibbonnaci offline, I'll take over the PCAM dataset.
Looks like we're all done
Thank you so much everyone who submitted a dataset, your help is much appreciated!
Tons of thanks to @pmeier in particular for all your help with submissions and the reviews!!
To support our colleagues' work on the FLAVA paper, and to foster collaborations in the multi-modal space, we would like to implement a few new datasets. Almost all of them are classification datasets but some also support other tasks like segmentation.
target_type
parameter. @pmeier #5116CC-ing @pmeier and @jdsgomes as previously discussed. We're on a fairly short timeline for this work, and ideally we would get all these in by end of January 2022. I'm also wondering whether this is something that our open source contributors @oke-aditya @frgfm @zhiqwang could be interested in 🚀 ?
Implementing a new dataset
Implementing a dataset consists of 2 main things:
root
,split
,transform
andtarget_transform
parameter. When available we should also support adownload
parameter (from what I checked, most of these are download-able apart maybe FER2013). See e.g. the MNIST classIf there's some ambiguity in the choices to make, the reference to follow is the VISSL where most of these datasets are already supported.
For contritbutors
If you're interesting in taking one of the datasets above, please comment below with "I'm working on dataset X" so that others don't pick the same! :)
cc @pmeier