Detection Datasets in Torchvision

oke-aditya commented 3 years ago

🚀 Feature

Can we have Pen Fudan Dataset in torchvision.datasets ?

Motivation

We use this dataset so often and commonly in tutorials ! It is much easier to prototype if we have

torchvision.datasets.PenFudan(root: str, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, download: bool = False)

then we could easily load this dataset in VOC Format

Pitch

Very commonly used and first one to think when it comes to detection and segmentation tasks. Pen Fudan is a simple dataset to prototype with instead of COCO. It would be really simple to load the data in VOC Format which is directly compatible with torchvision models. This would keep quickstart and prototyping very fast, like CIFAR10 does !

I'm not sure of some aspects.

Should the targets start from 0 or 1. In torchvision we assume 0 to be background, but that might not be always true.
Should we load the boxes in VOC Format, or have a param to control that ? We can use box_convert and return in format people need.

Alternatives

Currently in tutorial for Object Detection we do show how to load it and use it

It should be nice addtion, as we don't have a handy detection dataset to prototype (apart from COCO)

Additional context

I coulnd't find the paper and citation count, so I'm not sure if that is needed to add to torchvision.

cc @pmeier

fmassa commented 3 years ago

Hi,

Thanks for the suggestion!

I think one of the key requests from users when trying to fine-tune a model is how they should bring their own datasets for finetuning. As such, one of the main ingredients of the object detection finetuning tutorial is how to write a Dataset class that is compatible with the rest of the training abstractions that we provide. If the tutorial only contained torchvision.datasets.PenFudan(...) to get the data, the users would need to do more work to understand what they need to change to bring their own data.

So if we were to provide PenFudan in torchvision, we would need to find another dataset for the tutorials.

As of now, I think PenFudan is a good dataset for using in tutorials due to its simplicity (it doesn't provide boxes, so we have to compute them ourselves), and given that it only contain people as a class, it's not a very good benchmark compared to Pascal or COCO for finetuning (as both datasets are much larger than PenFudan and also contain the people class).

Datasets and standards

But this brings a separate (and very important) question as well: the datasets in torchvision do not have strong standardization wrt output types etc. This was already discussed in https://github.com/pytorch/vision/issues/1080 , but as I mentioned there, the more structure we add, the less flexible we are, and the more tied to a particular training loop we also are. If all datasets were formatted the same and had the same expected return types, then we could provide a PenFudan dataset in torchvision, as the specs for how a dataset should be formatted would be always the same and documented in a single place. But from the discussion in #1080, maybe this standardization is not something we should impose to the datasets.

cc @datumbox @pmeier for thoughts.

oke-aditya commented 3 years ago

Hi @fmassa I have a few thoughts, and let's not be specific about Pen Fudan but instead try for detection datasets in General.

I agree that the tutorial should contain how to create a dataset which is correct input format for torchvision models. I think this is integral part of tutorial and let's not change that.

Now, we again come back to question of datasets and standard. Let me make this Issue a bit generic. Currently torchvision supports lot of classification datasets (MNIST, LSUN, CIFAR, EMNIST, etc). For object detection, it currently supports COCO detection dataset.

So the question is.

How do we add new detection datasets to torchvision ?

PenFudan might not be the best dataset to add, But there are a few other datasets apart from COCO.

Objectron. A simple PyTorch notebook to load data is here.
Open Images dataset.

Possibly we could edit this list and take citations and contributing to datasets context. Datasets often come in different formats.

Some thoughts about standardization.

We cannot restrict dataset to load only in COCO format or VOC Format, different models need different format, and torchvision provides datasets for common use case, not just to load to torchvision models.
We might not be able to train test split these datasets as it depends on provider. If we do provide train and test it might not be consistent or possibly users would like to have different.

I also suggest to have a FakeDetectionDataset that can generate datasets in COCO, VOC and YOLO format. This is just analogous to FakeData already in torchvision.

oke-aditya commented 3 years ago

Closing this in favour of #3562.

pytorch / vision