Open ydcjeff opened 3 years ago
Hi,
Thanks for opening this issue.
This has been in our radar for a while already, but we never really managed to find out the right balance between simplicity and generality about the API. For example, about the API you proposed, it wouldn't be enough if we wanted to work on image + boxes + keypoints, or even image +segmentation map, so we would need a number of repeated implementations to cover the models in torchvision.
For an earlier attempt for the APIs, see https://github.com/pytorch/vision/issues/1406 and the discussion within.
I would love to hear your thoughts on this.
FYI, It seems that the existing batch_images
in GeneralizedRCNNTransform
plays the same role as the proposed LetterBox
here.
🚀 Feature
I would like to start adding/supporting transforms (both functional and class) for object detection, I know I can take some of them from
references
folder. But, it would nice to have OOTB. Here are a few basic transforms I would like to add first -RandomHorizontalFlipWithBBox
RandomVerticalFlipWithBBox
LetterBox
Pitch
All of the above transforms will accept 2 arguments when they are called. This breaks the purpose of
Compose
andnn.Sequential
, but currently aren't we writing customCompose
ornn.Sequential
? So I think it's ok to start introducing necessary transforms taking 2 arguments for detection, segmentation, etc and let users write customCompose
ornn.Sequential
the way they would to like to call the transforms.Additional context
Current code:
Thank you!
cc @vfdev-5, @fmassa