pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
15.99k stars 6.92k forks source link

[Feature Request] Random Spatial Transforms #287

Open esube opened 6 years ago

esube commented 6 years ago

I just saw the refactored transform and it looks much better. I had issues with the PIL backend of the previous transform and used to completely avoid the torchvision transform and implement the transforms I want locally using mostly opencv.

I also saw that color transforms are added in #275. These are all great! Is there a plan to add spatial transforms such as translation, rotation, shear etc.. (in general warping) augmentations. They are crucial in case of limited dataset training such as in attribute prediction, person re-id, extreme classification, etc...

alykhantejani commented 6 years ago

Hi @esube, this is something we could potentially do and, as PyTorch now has support for spatial transformers, this could be implemented as just sampling an affine transformation matrix.

I'll try and put something together for this in the next week or so

esube commented 6 years ago

@alykhantejani Thanks for your fast response as always. Yeah, the easiest way to implement this is using a generic affine transform matrix just like opencv's warp function. Better yet, you might even want to make the affine matrix as user input with some default behavior for the specific specializations: i.e. translation, rotation, shear etc...

alykhantejani commented 6 years ago

I think this functionality will be added as part of #303

daavoo commented 6 years ago

I have code somewhere that implements this functionallity using PIL.Image.transform and Image.AFFINE. I need to find some time to work on adapting the code to the current transforms API but I'm a little busy right now. Maybe next week

daavoo commented 6 years ago

Opened PR with random translation #363 .

@esube @alykhantejani Question: Do we want a "generic" RandomAffine transform that let's the user specify the 6 parameters of the affine matrix or it's better to have specific transforms like RandomRotation, RandomTranslation, etc. or .. both?

alykhantejani commented 6 years ago

I think both would be good, i.e. RandomRotate + RandomTranslate are useful and often clearer if you just want traslation for example. However, the power of the full affine matrix is good too (which can underneath just call the rotation + translation functions)