pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
15.99k stars 6.92k forks source link

Extend FakeDatasets for All Tasks #6993

Open oke-aditya opened 1 year ago

oke-aditya commented 1 year ago

πŸš€ The feature

Fake Datasets help us to quickly verify and validate if the instantiated model would work fine. This enables quick validation for testing purpose as well as for faster prototyping.

We already have FakeDataset in torchvision for the same, but it supports only ImageClassification as of now.

Motivation, pitch

We should have multiple FakeData classes. Like ImageClassificationFakeData, ObjectDetectionFakeData.

If we do that, we should deprecate FakeData in favor of ImageClassificationFakeData.

We could also think of supporting different formats. E.g. Support xywh as well as xyxy format? Support formats such as Binary masks, boolean masks etc? (Not sure of this need to discuss)

Alternatives

Other libraries are maintaining something similar to test models. These libraries mostly wrap torchvision models into their framework equivalent codes and test on sample datasets.

https://github.com/Lightning-AI/lightning-bolts/blob/master/tests/models/test_detection.py

https://github.com/Lightning-AI/lightning-flash/blob/master/tests/image/detection/test_model.py

https://github.com/oke-aditya/quickvision/blob/master/tests/dataset_utils.py

Additional context

@pmeier please chip in your thoughts!

cc @pmeier

pmeier commented 1 year ago

If we want a fake dataset for all of our models, we need to cover the following tasks:

How does a sample look like for each of these tasks?

We already have FakeDataset in torchvision for the same, but it supports only ImageClassification as of now.

That is only partly true. datasets.FakeData returns Tuple[PIL.Image.Image, int], but we can't pass that into a model directly. You will have to convert the PIL image to a tensor.

Plus, FakeData also supports the transform and target_transform parameters, which contradicts your assessment that it is used for model validation somewhat. If it were, there would be no need for any transforms, because the data is already in the right shape.

We could also think of supporting different formats. E.g. Support xywh as well as xyxy format? Support formats such as Binary masks, boolean masks etc? (Not sure of this need to discuss)

I don't think this is necessary. If the goal is to validate a model, we only need to provide datasets for inputs that our models support.

For testing, we already have functions to create all sort of images, bounding boxes, masks and videos in https://github.com/pytorch/vision/blob/main/test/prototype_common_utils.py. Of course we need to strip out all the torchvision.prototype stuff, but otherwise we can use that. TBH, I wanted to have a torchvision.testing namespace for quite some time that includes functions like make_image_tensor, make_image_pil, make_bounding_box, and the like. That would also make it quite a bit easier for third parties to test their stuff. But this is a different discussion.

oke-aditya commented 1 year ago

Sorry for delay. Although I haven't thought much about this yet.

That is only partly true. datasets.FakeData returns Tuple[PIL.Image.Image, int], but we can't pass that into a model directly. You will have to convert the PIL image to a tensor.

Can we return a Tensor from the dataset? Or a dict[str, Tensor] This would not require target transforms and transforms etc, as they aren't necessary for these functions.

pmeier commented 1 year ago

For all new datasets, we can return whatever we want. Of course that should match what the model down the line needs or otherwise the dataset is not useful.

For image classification, we need to keep BC. We can have a image_tensor: bool = False flag on there to enable the option to return tensors directly, but PIL images should still be the default. If we strongly feel that tensors should be the default, we could also go through a deprecation cycle, but IMO so far there is not enough traction for that.