pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
16.32k stars 6.97k forks source link

[feature proposal] Composing transformations with __add__ magic method #456

Open tomekkorbak opened 6 years ago

tomekkorbak commented 6 years ago

TLDR: transform = Resize((299, 299)) + ToTensor()

Motivation

Consider the common use case of building an overall-common-but-differing-at-one-point transformation pipeline for train and test sets, for instance applying augmentation transforms only to train set:

train_transform = transforms.Compose([
    Resize((299, 299), interpolation=1),
    RandomCrop(299, padding=4),
    RandomHorizontalFlip(),
    RandomGrayscale(),
    ToTensor(), 
    Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

test_transform = transforms.Compose([
    Resize((299, 299), interpolation=1), 
    ToTensor(), 
    Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

A way of expressing of train_transform builds upon test_transform would be handy. What you could do now is to recursively use transforms.Compose (i.e. Compose([Compose(...resizing...), Compose(...augmentation...], Normalize(...)])), but I don't find this particularly clean (and intuitively, final transforms objects should be sequential objects with only one level of depth).

Proposed behavior

common_transforms = ToTensor() + Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
augmentation = RandomCrop(299, padding=4) + RandomHorizontalFlip() + RandomGrayscale()
train_transforms =  Resize((299, 299)) + augmentation + common_transforms
test_transforms = Resize((299, 299)) + common_transforms

Note how this mimics the behavior of Python lists (and other iterables):

>>> [1, 2] + [3, 4]
[1, 2, 3, 4]

Implementation sketch

class Transform(object):
    # Right now all transforms directly inherit from object. 
    # Let Transform be a superclass for all transformations

    def __call__(self, pic):
        raise NotImplementedError('Each subclass should implement this method')

    def __repr__(self):
        raise NotImplementedError('Each subclass should implement this method')

    def __add__(self, other):
        if not isinstance(other, Transform):
            raise TypeError('Only transformations can be added')
        if isinstance(self, Compose) and isinstance(other, Compose):
            return Compose(self.transforms + other.transforms)
        if not isinstance(self, Compose) and isinstance(other, Compose):
            other.transforms = self + other.transforms
            return other
        if isinstance(self, Compose) and not isinstance(other, Compose):
            self.transforms = self.transforms + other
            return self
        if not isinstance(self, Compose) and not isinstance(other, Compose):
            return Compose([self, other])

Comments are most welcome!

fmassa commented 6 years ago

While I think such a functionality might be somewhat useful (but a bit less efficient than an iterator on a list), I didn't quite get the drawbacks that you pointed in your message. You could still have something like

common = T.Compose([T.ToTensor(),
                    T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])])

augmentation = T.Compose([T.RandomCrop(299, padding=4),
                          T.RandomHorizontalFlip(),
                          T.RandomGrayscale()])

train_transforms =  T.Compose([T.Resize((299, 299)), augmentation, common_transforms])
test_transforms = T.Compose([T.Resize((299, 299)), common_transforms])

which is still fairly ok, and a bit more efficient than having a long concatenation of composes (that will happen due to the + always creating a new T.Compose).

But I'm not against such proposal. @alykhantejani what do you think?

tomekkorbak commented 6 years ago

@fmassa, thanks for your feedback. I understand your reservations: this proposal does not extend existing functionality, but only provides a nicer syntax. As for advantages of the proposed API over your code snippet:

As for efficiency, the code I provided was a first approximation; I guess creating new object could be avoided in most cases and we could end up with creating fewer objects compared to nesting Composes (we could prepend/append to a Compose.transforms whenever possible).

Nevertheless, I agree it is for the community to decide if this proposal adds any value.

alykhantejani commented 6 years ago

I think this functionality would be welcome, especially as it's just adding syntactic sugar on top of existing functionality - as @fmassa showed, it can be achieved by nesting composes.

I think the __repr__ of this would be much nicer than a nested set of Composes.

As for speed, I think we should do some benchmarking before we try to optimize this as simple code is generally my preferred route.

fmassa commented 6 years ago

Sounds good. @tomekkorbak , if you are willing to a PR, it would be most welcome! :)

fmassa commented 6 years ago

@karandwivedi42 the + operator is meant to be a syntactic sugar for the Compose API, and not to extend on the functional API.

But I agree that we should ideally have only one way of doing something.

0phoff commented 6 years ago

I wanted similar behaviour and ended up creating my own Compose class for this:

class Compose(list):
  def __call__(self, data):
    for t in self:
      data = t(data)
    return data

My Compose is basically a callable list... Not sure if it has any drawbacks (it probably has), but this has the advantage of being familiar to most people, as it has all the methods lists have!
A few things you can do with it:

preprocess = Compose([...])

# Concatenate composes
preprocess += Compose([...])

# Concatenate Compose with a list
preprocess += [...]
preprocess.extend([...])
preprocess.append(...)

# Access/change individual elements
step1 = preprocess[0]
preprocess[1] = ...

As you can see, it doesn't allow you to add a function to a compose like @tomekkorbak wanted and also will not magically turn a sum of 2 transforms in a compose, but I do believe this is quite a clean implementation. All that is needed is to wrap your functions in a list and your done!

I really prefer my Compose, but would like to use the 'standard PyTorch way' as much as possible, so it would be cool to implement this into torchvision.

EDIT
You might want to also add a __coerce__(self, other) method to my Compose, if you want to be able to perform operations like [...] + preprocess. I don't really need that, so I didn't implement it as it would make everything slightly more complicated...

Example implementation (untested):

def __coerce__(self, other):
  if isinstance(other, Iterable):
    return self, Compose(other)
  else:
    return None
danielkurniadi commented 5 years ago

Has this feature merged yet? Would be awesome if I can append transformations

0phoff commented 5 years ago

Has this feature merged yet? Would be awesome if I can append transformations

I created and use my own version of a Compose class in my library. I wouldn't mind creating a PR for this, but nobody from the core team answered how they see this class evolving...

In the meantime, you could probably just copy my code. Note that I dont have the coerce functionality I talked about, but that is only necessary if you want to preprend transformations: [...] + preprocess.

class Compose(list):
    """ This is lightnet's own version of :class:`torchvision.transforms.Compose`.

    Note:
        The reason we have our own version is because this one offers more freedom to the user.
        For all intends and purposes this class is just a list.
        This `Compose` version allows the user to access elements through index, append items, extend it with another list, etc.
        When calling instances of this class, it behaves just like :class:`torchvision.transforms.Compose`.

    Note:
        I proposed to change :class:`torchvision.transforms.Compose` to something similar to this version,
        which would render this class useless. In the meanwhile, we use our own version
        and you can track `the issue`_ to see if and when this comes to torchvision.

    .. _the issue: https://github.com/pytorch/vision/issues/456
    """
    def __call__(self, data):
        for tf in self:
            data = tf(data)
        return data

    def __repr__(self):
        format_string = self.__class__.__name__ + ' ['
        for tf in self:
            format_string += '\n  {tf}'
        format_string += '\n]'
        return format_string
danielkurniadi commented 5 years ago

Hi @0phoff this LGTM. 👍 . I like it simple. Thanks man.

@fmassa Should we consider adding __add__(), __append(), or __extend__() to pytorch?

Imo, it suits with application of action recognition model or any model that has many preprocessing options for their compatible dataset. We have a lot of modalities which preprocessings and transforms can be basic but different for model-modality combination.

Of course it would help in general.

romesco commented 4 years ago

ping =] seems useful to me!

0phoff commented 3 years ago

@fmassa, is there any interest for this feature from the pytorch-vision team ?
I am willing to take the time to properly implement my class Compose(list) into this repository, but want to know whether you would be intereseted to merge this if I do... :)

A quick first look tells me that this would not introduce any breaking changes, but just add some extra features in order to make composition of transformation pipelines easier. As my solution is using the base python list class, most features come out of the box without much coding required, and thus less unit testing too!

Let me know if you think this would be worth creating a PR for.