[RFC] New Augmentation techniques in Torchvison

oke-aditya commented 3 years ago

🚀 Feature

Inclusion of new Augmentation techniques in torchvision.transforms.

Motivation

Transforms are important for data augmentation :sweat_smile:

Proposals

[x] RandAugment Citations 306 #4348
[x] ~~Cutout Citations 964~~ Superseded by RandomErasing
[x] MixUp Citations 1624 Code (Note it is CC-BY-NC-license) #4379
[x] CutMix Citations 437 Code ICCV 2019 #4379
[x] TrivialAugment ICCV 2021 #4221
[x] AugMix Citations 157
[x] Scale-Jitter Citations 11 - code - benchmarks
[x] Simple CopyPaste #5825
[ ] Port SSD & SSDlite Augmentations and Mixup/Cutmix from references to vision
[ ] Detection Transforms #1406 #2213 #3980
[ ] AutoAugment Detection 231 Citations code
[ ] Greedy Search Policy
[ ] FastAutoAugment Citations 135 NeurIPS 2019 Code (1k+ GitHub stars)
[ ] ReMixMatch Citations 165
[ ] FixMatch Citations 258
[ ] Transforms for color spaces augmentations E.g. rgb2hsv, rgb2bgr, rgb2lab, etc #4029
[ ] Gaussian Noise See #6192

Additional context

To visitors Kindly give a :+1: if you think any of these would help in your work.

Also if you have any transform in mind please provide few details here!

Linked to #3221

cc @vfdev-5 @fmassa

hassiahk commented 3 years ago

Would be nice to have these Augmentations in torchvision.transforms. 😄

Also found this official code implementation for Cutout.

datumbox commented 3 years ago

@oke-aditya Do we need the cutout given we have Random Erasing which can be configured to have the same more or less effect?

oke-aditya commented 3 years ago

I think the same. I compared both the implementations. RandomErasing is newer than Cutout, also both the augmentations produce almost similar results.

Also, As per docs RandomErasing does not work for for PIL Images. It works only for torch.Tensor. I am not sure if that is intentional or needs some work.

tflahaul commented 3 years ago

Not a transform idea but what about adding an optional 'target_transforms' argument in transforms.RandomApply? That way random imgs transforms and their targets equivalents could be applied at the same time. The current way of doing so (from what I know) is by writing your own class and using functional transforms. For example :

class AugmentExample:
    def __init__(self, p=0.1):
        self.p = p

    def __call__(self, img, box):
        if self.p < torch.rand(1):
            img = img.flip(-1)
            box = box_hflip(box)
        return img, box

Also having a lot more keypoints/bbox transforms would be really great (ideally any image transform that involves a transformation of the targets should be accompanied by one?).

(Sorry if my english isn't right, I speak baguette.)

oke-aditya commented 2 years ago

@datumbox is closing this issue intended?

As I understand there is dataset and transforms rework. Which would be a major refactor

Do we plan to migrate all transforms to new ones in near future?

(I had a minor look at the proposal which looks fantastic)

datumbox commented 2 years ago

Not at all intended; Github just closed it when I merged the PR. We are certainly not done here :D

lezwon commented 2 years ago

@datumbox I'd like to take up ReMixMatch augmentation if no one's working on it. Would need some guidance on how to go around it though :)

datumbox commented 2 years ago

@lezwon Thanks a lot for offering to help!

ReMixMatch focuses on learning augmentations and on using unlabelled data. One challenge with that is that the majority of the changes will have to land on references which are outside of TorchVision. Currently the reference scripts are in need of some rework to reduce the amount of duplicate code and improve the overall quality. It's on the top of our todos and until that's done, ideally we would like to avoid introducing significantly complex techniques like ReMixMatch.

I wonder if you would be interested in implementing the AutoAugment Detection algorithm listed above. @vfdev-5 has already added most of the necessary low-level kernels for doing transforms on the BBoxes in torchvision.prototype, so what's needed is to implement the AutoAugment technique itself. Of course since it touches prototype APIs it can be tricky too. Let me know your thoughts and perhaps Victor can also pitch in to see if it makes sense to work together and test the new API. Alternatively we can discuss for another contribution that you find interesting.

BTW I'm currently working on the SimpleCopyPaste contribution trying to if we can train more accurate models using it. I'll let you know when I have the full results. :)

lezwon commented 2 years ago

@datumbox AutoAugment sounds good. I'll start looking into it. :) Also, I noticed your comment on SimpleCopyPaste PR. Lemme know if I can help in any way :)

datumbox commented 2 years ago

@lezwon Fantastic! Just note that I'm talking about this version designed for detection: AutoAugment Detection. This is different from the already supported algorithm for classification. :)

vfdev-5 commented 2 years ago

Without talking about transforms implementation in the prototype (how input type dispatch would happen etc), the only thing I think we could miss to implement AA Detection is bbox_only_* aug. I do not think that it is something complicated to implement.

ain-soph commented 1 year ago

May I ask what's the current plan for the Fast AutoAugment?

I have the implementation of another paper called Faster Autoaugment:
https://github.com/ain-soph/trojanzoo/tree/main/trojanvision/utils/autoaugment. It's the reimplementation based on autoalbument.

Are maintainers interested in embedding this technique as well? If interested, what's the expected api to use it?
If there is a similar plan for the Fast AutoAugment as the template, I'm glad to follow.

Related issue: #5000

datumbox commented 1 year ago

@ain-soph The Fast* AutoAugment methods are indeed on our radar. We should examine adding them after the work on the new Transforms API is complete. Let me explain why it's not primary target at this point:

We already offer strong auto-augmentation strategies that provide good results on classification (AutoAugment, RandAugment, TrivialAugmentWide and AugMix). Adding another one might not add as much value, though I appreciate the one you propose learns the augmentations from data which can be interesting.
The reason why we prioritize AutoAugment Detection and SimpleCopyPaste is because they cover areas for which we didn't have great support, for example Object Detection. We were lagging significantly from SOTA and in v0.13 we closed the gap by improving our accuracy by 8.1 mAP on average. So any technique that is going to improve other non-image classification tasks is priortized.
The API of Fast* AutoAugment methods is tricky as they are not just the transforms but also the modules/trainers for doing the learning. This is similar to other techniques such as Greedy Search Policy, which is on our radar but might be tricky to implement at this point. Key problem is that these techniques are tightly coupled with training loops and might require committing to a specific training paradigm. Though fixing our training loops is something on our radar (cc @kartikayk) the discussions are early days on how to achieve this.
We are still working on the API of Transforms v2 which will provide native support to all computer vision tasks and primitives for Bounding Boxes, Masks, Multiple Images and Videos. Currently we only support implementations for Images which is very limiting. Adding more non-critical transforms, increases the amount of tech-debt we have and makes it harder for us to migrate users out of them. Some of the transforms also require changes on the APIs and this is the reason some augmentations are placed on our References (like MixUp and CutMix) instead of putting them in main TorchVision. Hopefully all these are going to be resolved soon once we release the new Transforms API.

One area we could use help is models and particularly Video architectures. Have a look at #2707 for some ideas. I hope the current situation won't discourage you from sticking around and continue contributing to TorchVision. We definitely want any help we can get from the community! :)

pytorch / vision