pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
16.27k stars 6.96k forks source link

[RFC] New datasets to torchvision #3562

Open oke-aditya opened 3 years ago

oke-aditya commented 3 years ago

🚀 Feature

This is a proposal to add more highly cited datasets. Thanks to papers with code datasets which made this search easy.

Motivation

These datasets are used quite frequently and would provide benefits to both researchers as well as people who work in computer vision. I'm not sure of the citation metric, but we can verify the count of papers once.

Pitch

The following datasets can be considered. Papers are reported as per the last 5 years count on papers with code. They can be inaccurate, feel free to edit. I'm also adding previously approved or proposed ones

See #5108

Probably, we should think and add these, one by one. Also support downloading, not just loading of the dataset.

Additional context

Please feel free to discuss about datasets before opening PRs!

cc @pmeier

fmassa commented 3 years ago

Hi,

This is exactly our current idea, thanks for bringing it up.

I agree with all the aforementioned proposals. One thing to mention as well is that there is an ongoing effort to provide new dataset abstractions in PyTorch via DataPipes https://github.com/pytorch/pytorch/issues/49440.

While this doesn't block us providing new datasets, it is good to keep in mind that we might in the future revisit the way we implement datasets.

seyeeet commented 3 years ago

related to this issue, it can also be useful if pytorch can store this datasets on their storage and provide link to download them. e.g. there are a lot of issues with downloading imagenet and other large datasets, im not sure if licensing can be problematic, but it would be super useful

pmeier commented 3 years ago

@seyeeet

im not sure if licensing can be problematic

Yes, it is and thus

pytorch can store this datasets on their storage and provide link to download them

will never happen.

Also see this section in our README

This is a utility library that downloads and prepares public datasets. We do not host or distribute these datasets, vouch for their quality or fairness, or claim that you have license to use the dataset. It is your responsibility to determine whether you have permission to use the dataset under the dataset's license.

harishsdev commented 3 years ago

as per observation from "torchvision/datasets/" below datasets need to be added,please update the pitch

LFW Labeled Faces in Wild Market-1501 492 papers MPII Human Pose VGGFace2 Earlier requested in #1193 #2910 Here is tar.gz file. Hopefully we can add it MovingMNIST Perviously approved in #2676 #2690. iNaturalist #3292 LVIS

pmeier commented 3 years ago

Hey @harishsdev, not sure what you mean. From the original pitch only KITTI was added, which is correctly marked. In your list you left out CUB-200-2011, which is not supported yet. We do feature the Caltech(101|256) datasets, but they are not related other than coming from the same university.

ABD-01 commented 3 years ago

Hi @harishsdev, I have created a pr for LFW Dataset, can you guide me about any further changes.

jgbradley1 commented 3 years ago

The link provided for VGGFace2 is not correct; That link points to the first VGGFace dataset (which is available from this page).

oke-aditya commented 3 years ago

Actually the tar.gz is down for many months. Don't know what happened to VGG Face

https://www.robots.ox.ac.uk/~vgg/data/vgg_face2

Probably this is the link https://www.robots.ox.ac.uk/~vgg/data/vgg_face/vgg_face_dataset.tar.gz

jgbradley1 commented 3 years ago

Probably this is the link https://www.robots.ox.ac.uk/~vgg/data/vgg_face/vgg_face_dataset.tar.gz

Respectfully, that is the wrong url. The link you've provided is for the first version of VGGFace. The original pitch asked for VGGFace2, which will not be possible to provide at this time.

yassineAlouini commented 2 years ago

@oke-aditya can we add the SmallNORB dataset to the list as introduced in this PR: https://github.com/pytorch/vision/pull/492. Thanks in advance. :)

yassineAlouini commented 2 years ago

@oke-aditya Should we add the FGVC-Aircraft dataset (as implemented in this PR)?

pmeier commented 2 years ago

@yassineAlouini We already have FGVC-Aircraft in the current API

https://github.com/pytorch/vision/blob/fb7f9a16628cb0813ac958da4525247e325cc3d2/torchvision/datasets/fgvc_aircraft.py#L12

as well as #5354 to track progress for porting it to the prototype one.