Closed oke-aditya closed 3 years ago
Hi,
Thanks for the suggestion!
I think one of the key requests from users when trying to fine-tune a model is how they should bring their own datasets for finetuning. As such, one of the main ingredients of the object detection finetuning tutorial is how to write a Dataset class that is compatible with the rest of the training abstractions that we provide.
If the tutorial only contained torchvision.datasets.PenFudan(...)
to get the data, the users would need to do more work to understand what they need to change to bring their own data.
So if we were to provide PenFudan in torchvision, we would need to find another dataset for the tutorials.
As of now, I think PenFudan is a good dataset for using in tutorials due to its simplicity (it doesn't provide boxes, so we have to compute them ourselves), and given that it only contain people as a class, it's not a very good benchmark compared to Pascal or COCO for finetuning (as both datasets are much larger than PenFudan and also contain the people class).
But this brings a separate (and very important) question as well: the datasets in torchvision do not have strong standardization wrt output types etc. This was already discussed in https://github.com/pytorch/vision/issues/1080 , but as I mentioned there, the more structure we add, the less flexible we are, and the more tied to a particular training loop we also are. If all datasets were formatted the same and had the same expected return types, then we could provide a PenFudan dataset in torchvision, as the specs for how a dataset should be formatted would be always the same and documented in a single place. But from the discussion in #1080, maybe this standardization is not something we should impose to the datasets.
cc @datumbox @pmeier for thoughts.
Hi @fmassa I have a few thoughts, and let's not be specific about Pen Fudan but instead try for detection datasets in General.
I agree that the tutorial should contain how to create a dataset which is correct input format for torchvision models. I think this is integral part of tutorial and let's not change that.
Now, we again come back to question of datasets and standard. Let me make this Issue a bit generic. Currently torchvision supports lot of classification datasets (MNIST, LSUN, CIFAR, EMNIST, etc). For object detection, it currently supports COCO detection dataset.
So the question is.
PenFudan might not be the best dataset to add, But there are a few other datasets apart from COCO.
Possibly we could edit this list and take citations and contributing to datasets context. Datasets often come in different formats.
Some thoughts about standardization.
train
and test
it might not be consistent or possibly users would like to have different.I also suggest to have a FakeDetectionDataset that can generate datasets in COCO, VOC and YOLO format. This is just analogous to FakeData
already in torchvision.
Closing this in favour of #3562.
🚀 Feature
Can we have Pen Fudan Dataset in
torchvision.datasets
?Motivation
We use this dataset so often and commonly in tutorials ! It is much easier to prototype if we have
then we could easily load this dataset in VOC Format
Pitch
Very commonly used and first one to think when it comes to detection and segmentation tasks. Pen Fudan is a simple dataset to prototype with instead of COCO. It would be really simple to load the data in VOC Format which is directly compatible with torchvision models. This would keep quickstart and prototyping very fast, like CIFAR10 does !
I'm not sure of some aspects.
box_convert
and return in format people need.Alternatives
Currently in tutorial for Object Detection we do show how to load it and use it
It should be nice addtion, as we don't have a handy detection dataset to prototype (apart from COCO)
Additional context
I coulnd't find the paper and citation count, so I'm not sure if that is needed to add to torchvision.
cc @pmeier