pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
16.19k stars 6.95k forks source link

Move GroupedBatchSampler into torchvision #3714

Open bradezard131 opened 3 years ago

bradezard131 commented 3 years ago

🚀 Feature

Move the GroupedBatchSampler into the core library

Motivation

Grouping minibatch elements is often a useful feature for vision tasks, especially in object detection and segmentation problems where it is commonplace to use images with different shapes and aspect ratios. Torchvision already has an implementation of a sampler that can do this, however it isn't part of the library itself, only in the references. As such users either have to copy-paste it into their own code or import detectron2 to utilise it.

Pitch

Probably just copy the entire group_by_aspect_ratio.py into torchvision/datasets/samplers, perhaps split it out if necessary. I think this would all be useful for people working with detection or segmentation tasks that don't want to bring in all of d2.

datumbox commented 3 years ago

We'll need to have a further look on the API of the class to ensure we can commit to it but in general I see favourably providing necessary tools such as this within TorchVision.

fmassa commented 3 years ago

Hi,

I believe this will be covered with the rewrite of datasets using DataPipes, in particular the GrouperIterDataPipe function, so that you can just do `dataset.groupby(lambda x: ...). The good thing about this implementation based on DataPipe is that it won't require knowing ahead of time the groups of all the elements in the dataset (it can be computed on-the-fly).

I believe once we provide new datasets based on datapipes (which should happen within the next few months), we will be able to close this.

datumbox commented 3 years ago

Sounds good to me, thanks @fmassa for providing the context. I'll remove it from the Batteries Included list then.

dreamflasher commented 2 years ago

Is this now supported by https://github.com/pytorch/pytorch/blob/master/torch/utils/data/datapipes/iter/grouping.py#L170 and how would you group by aspect ratio with DataPipes?