Open oke-aditya opened 3 years ago
Hi,
This is exactly our current idea, thanks for bringing it up.
I agree with all the aforementioned proposals. One thing to mention as well is that there is an ongoing effort to provide new dataset abstractions in PyTorch via DataPipes https://github.com/pytorch/pytorch/issues/49440.
While this doesn't block us providing new datasets, it is good to keep in mind that we might in the future revisit the way we implement datasets.
related to this issue, it can also be useful if pytorch can store this datasets on their storage and provide link to download them. e.g. there are a lot of issues with downloading imagenet and other large datasets, im not sure if licensing can be problematic, but it would be super useful
@seyeeet
im not sure if licensing can be problematic
Yes, it is and thus
pytorch can store this datasets on their storage and provide link to download them
will never happen.
Also see this section in our README
This is a utility library that downloads and prepares public datasets. We do not host or distribute these datasets, vouch for their quality or fairness, or claim that you have license to use the dataset. It is your responsibility to determine whether you have permission to use the dataset under the dataset's license.
as per observation from "torchvision/datasets/" below datasets need to be added,please update the pitch
LFW Labeled Faces in Wild Market-1501 492 papers MPII Human Pose VGGFace2 Earlier requested in #1193 #2910 Here is tar.gz file. Hopefully we can add it MovingMNIST Perviously approved in #2676 #2690. iNaturalist #3292 LVIS
Hey @harishsdev, not sure what you mean. From the original pitch only KITTI was added, which is correctly marked. In your list you left out CUB-200-2011, which is not supported yet. We do feature the Caltech(101|256)
datasets, but they are not related other than coming from the same university.
Hi @harishsdev, I have created a pr for LFW Dataset, can you guide me about any further changes.
The link provided for VGGFace2 is not correct; That link points to the first VGGFace dataset (which is available from this page).
Actually the tar.gz is down for many months. Don't know what happened to VGG Face
https://www.robots.ox.ac.uk/~vgg/data/vgg_face2
Probably this is the link https://www.robots.ox.ac.uk/~vgg/data/vgg_face/vgg_face_dataset.tar.gz
Probably this is the link https://www.robots.ox.ac.uk/~vgg/data/vgg_face/vgg_face_dataset.tar.gz
Respectfully, that is the wrong url. The link you've provided is for the first version of VGGFace. The original pitch asked for VGGFace2, which will not be possible to provide at this time.
@oke-aditya can we add the SmallNORB
dataset to the list as introduced in this PR: https://github.com/pytorch/vision/pull/492. Thanks in advance. :)
@oke-aditya Should we add the FGVC-Aircraft dataset (as implemented in this PR)?
@yassineAlouini We already have FGVC-Aircraft in the current API
as well as #5354 to track progress for porting it to the prototype one.
🚀 Feature
This is a proposal to add more highly cited datasets. Thanks to papers with code datasets which made this search easy.
Motivation
These datasets are used quite frequently and would provide benefits to both researchers as well as people who work in computer vision. I'm not sure of the citation metric, but we can verify the count of papers once.
Pitch
The following datasets can be considered. Papers are reported as per the last 5 years count on papers with code. They can be inaccurate, feel free to edit. I'm also adding previously approved or proposed ones
See #5108
Probably, we should think and add these, one by one. Also support downloading, not just loading of the dataset.
Additional context
Please feel free to discuss about datasets before opening PRs!
cc @pmeier