pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
15.71k stars 6.87k forks source link

[RFC] New Augmentation techniques in Torchvison #3817

Open oke-aditya opened 3 years ago

oke-aditya commented 3 years ago

šŸš€ Feature

Inclusion of new Augmentation techniques in torchvision.transforms.

Motivation

Transforms are important for data augmentation :sweat_smile:

Proposals

Additional context

To visitors Kindly give a :+1: if you think any of these would help in your work.

Also if you have any transform in mind please provide few details here!

Linked to #3221

cc @vfdev-5 @fmassa

hassiahk commented 3 years ago

Would be nice to have these Augmentations in torchvision.transforms. šŸ˜„

Also found this official code implementation for Cutout.

datumbox commented 3 years ago

@oke-aditya Do we need the cutout given we have Random Erasing which can be configured to have the same more or less effect?

oke-aditya commented 3 years ago

I think the same. I compared both the implementations. RandomErasing is newer than Cutout, also both the augmentations produce almost similar results.

Also, As per docs RandomErasing does not work for for PIL Images. It works only for torch.Tensor. I am not sure if that is intentional or needs some work.

tflahaul commented 3 years ago

Not a transform idea but what about adding an optional 'target_transforms' argument in transforms.RandomApply? That way random imgs transforms and their targets equivalents could be applied at the same time. The current way of doing so (from what I know) is by writing your own class and using functional transforms. For example :

class AugmentExample:
    def __init__(self, p=0.1):
        self.p = p

    def __call__(self, img, box):
        if self.p < torch.rand(1):
            img = img.flip(-1)
            box = box_hflip(box)
        return img, box

Also having a lot more keypoints/bbox transforms would be really great (ideally any image transform that involves a transformation of the targets should be accompanied by one?).

(Sorry if my english isn't right, I speak baguette.)

oke-aditya commented 2 years ago

@datumbox is closing this issue intended?

As I understand there is dataset and transforms rework. Which would be a major refactor

Do we plan to migrate all transforms to new ones in near future?

(I had a minor look at the proposal which looks fantastic)

datumbox commented 2 years ago

Not at all intended; Github just closed it when I merged the PR. We are certainly not done here :D

lezwon commented 2 years ago

@datumbox I'd like to take up ReMixMatch augmentation if no one's working on it. Would need some guidance on how to go around it though :)

datumbox commented 2 years ago

@lezwon Thanks a lot for offering to help!

ReMixMatch focuses on learning augmentations and on using unlabelled data. One challenge with that is that the majority of the changes will have to land on references which are outside of TorchVision. Currently the reference scripts are in need of some rework to reduce the amount of duplicate code and improve the overall quality. It's on the top of our todos and until that's done, ideally we would like to avoid introducing significantly complex techniques like ReMixMatch.

I wonder if you would be interested in implementing the AutoAugment Detection algorithm listed above. @vfdev-5 has already added most of the necessary low-level kernels for doing transforms on the BBoxes in torchvision.prototype, so what's needed is to implement the AutoAugment technique itself. Of course since it touches prototype APIs it can be tricky too. Let me know your thoughts and perhaps Victor can also pitch in to see if it makes sense to work together and test the new API. Alternatively we can discuss for another contribution that you find interesting.

BTW I'm currently working on the SimpleCopyPaste contribution trying to if we can train more accurate models using it. I'll let you know when I have the full results. :)

lezwon commented 2 years ago

@datumbox AutoAugment sounds good. I'll start looking into it. :) Also, I noticed your comment on SimpleCopyPaste PR. Lemme know if I can help in any way :)

datumbox commented 2 years ago

@lezwon Fantastic! Just note that I'm talking about this version designed for detection: AutoAugment Detection. This is different from the already supported algorithm for classification. :)

vfdev-5 commented 2 years ago

Without talking about transforms implementation in the prototype (how input type dispatch would happen etc), the only thing I think we could miss to implement AA Detection is bbox_only_* aug. I do not think that it is something complicated to implement.

ain-soph commented 1 year ago

May I ask what's the current plan for the Fast AutoAugment?

I have the implementation of another paper called Faster Autoaugment:
https://github.com/ain-soph/trojanzoo/tree/main/trojanvision/utils/autoaugment. It's the reimplementation based on autoalbument.

Are maintainers interested in embedding this technique as well? If interested, what's the expected api to use it?
If there is a similar plan for the Fast AutoAugment as the template, I'm glad to follow.

Related issue: #5000

datumbox commented 1 year ago

@ain-soph The Fast* AutoAugment methods are indeed on our radar. We should examine adding them after the work on the new Transforms API is complete. Let me explain why it's not primary target at this point:

One area we could use help is models and particularly Video architectures. Have a look at #2707 for some ideas. I hope the current situation won't discourage you from sticking around and continue contributing to TorchVision. We definitely want any help we can get from the community! :)