pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
16.34k stars 6.97k forks source link

transforms of auxillary data #329

Closed ghost closed 6 years ago

ghost commented 7 years ago

Problems like object detection/segmentation needs transforms on the image to be concomitant with transforms on auxiliary data (bounding boxes, segmentation masks...). I have had to implement such functionality by porting functions from, https://github.com/tensorflow/models/blob/master/research/object_detection/core/preprocessor.py

If there's interest in having such functionality reside here, I would be very interested contributing to this. I'm not entirely sure what design choices I'd have to abide by here.

(I also have been porting some non-vectorizable procedures for NMS and RandomCrop sampling, which likely should also reside here).

daavoo commented 7 years ago

I'm also intereseted in include transforms for segmentation. I think that it should be quite straightforward to extend the existing transforms to support multiple inputs given the current functional API. Maybe something like:

class MultiRandomRotation(RandomRotation):
    def __call__(self, img, target):
        angle = self.get_params(self.degrees)
        new_img = rotate(img, angle, self.resample, self.expand,
            self.center, self.translate)
        new_target = rotate(target, angle, self.resample, self.expand,
            self.center, self.translate)
        return  new_img, new_target
alykhantejani commented 7 years ago

@daavoo is correct, the functional API was introduced to solve exactly this problem see this comment for more info on how this would be possible.

@akssri as for new transforms for segmentation etc, these would be very welcome. It might be worth syncing with @killeent who is also working on some of these/has some ideas around this.

killeent commented 7 years ago

Hi @akssri - what particular functionality have you ported thus far? I am looking into some of the object detection tooling necessary to implement a Faster R-CNN, but its very much in the early stages.

daavoo commented 7 years ago

Is there a desired naming convention for segmentation/detection transforms??? For example RandomRotation_Seg or RandomRotation_Det for the respective extensions of RandomRotation ??

ghost commented 7 years ago

@alykhantejani @daavoo Yes, for the masks there's not much to be done, since they are, in essence, additional channels on the image. The functions for applying these transformations to bounding boxes will, however, need to be of a different flavor.

Tensorflow's object detection (AFAIR) code uses separate classes for boxes and points and so on with transforms defined on them individually, but one only needs to support transformation on arrays of normalized co-ordinates to deal with most of these tasks (including convex masks).

I'd personally prefer to extend relevant transformations in functional.py by having them take on additional 2d-coordinates parameter that gets co-transformed when given. Having separate functions for feature inputs and one for co-ordinate inputs is not going to be pretty IMO.

@killent I have a number of bounding box-aware resize/crop/pad... functions implemented. I'm currently using the cython nms function from, https://github.com/rbgirshick/py-faster-rcnn and am in the process of porting sample_distorted_bounding_box from Tensorflow, https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/sample_distorted_bounding_box_op.cc

These are really the only non-pythonic bits necessary for fast inference/training on single-shot detectors (SSD, YOLO) (the addition of these ops in tensorflow coincides with their object detection efforts). Faster RCNN AFAIK only requires ROI-pooling in-addition, which is just strided slicing (or resampling..); all the anchor-box infrastructure is fairly similar (atleast from what I remember).

Tensorflow has a (approximate) bi-partite matching matcher using the Hungarian algorithm, but this doesn't seem to be widely used. I'm not sure if there was a C++ kernel for this (probably not). The more common hard-mining functions can be written with vectorizable code and nms, though.

Now that I think about it, I wonder if some these functions (nms) should go into pytorch.

fmassa commented 7 years ago

@akssri The original point of having the functional API for torchvision was to keep things simple and reuse code whenever possible in those more complex cases. I think that having separate function for different data domains is better and keeps the interface simpler. So I think it would be better to implement a flip_bbox function, that takes the box plus the image width, and performs the flip. I think that nms is very specific to some vision tasks, and should ideally live outside pytorch (maybe in torchvision?).

ghost commented 7 years ago

@fmassa Fair enough. Is it okay to cook up some decorator magic to simulate multiple dispatch (but keep the names the same) ? I can imagine this would make it easier further on, while still keeping the functional API.

ghost commented 7 years ago

To make things concrete, I have something like the following in mind.

#
class GenericFunction(object):
    methods = {}; initializedp = False
    def __new__(cl, *args, **kwargs):
        if not cl.initializedp:
            [getattr(cl, _x) for _x in dir(cl)]
            cl.initializedp=True
        return cl.__call__(cl, *args, **kwargs)
    def __call__(cl, x, *args, **kwargs):
        despatch_method = cl.methods[type(x)]
        return despatch_method(x, *args, **kwargs)

class Method(object):
    def __init__(self, function, despatch_type):
        self.function = function
        self.despatch_type = despatch_type
    def __get__(self, instance, objtype=None):
        if objtype is not None:
            objtype.methods[self.despatch_type] = self.function
        return self.function

def defmethod(type):
    def wrapper(function):
        return Method(function, type)
    return wrapper

class crop(GenericFunction):
    methods = {}; initializedp = False
    @defmethod(int)
    def _int(x):
        return x + 1
    @defmethod(np.ndarray)
    def _ndarray(x):
        return x + np.pi
tribbloid commented 5 years ago

why it is closed if you haven't found a solution to it?

fmassa commented 5 years ago

@tribbloid in 0.3, we provide reference training scripts for classification, detection and segmentation. It includes inside helper functions to perform data transformation on segmentation masks / bounding boxes / keypoints.

They are currently under the references/ folder in the torchvision repo, and once we are more clear on the API it will be moved to the torchvision package

juanmed commented 5 years ago

@fmassa

Thanks for clarifying the location of the reference transformations. I was wondering if there is any reference script we can look at for using them. I looked at the colab notebook posted with the release of 0.3 and the reference train code but both of them use only the ToTensor and RandomHorizontalFlip which do not handle the target dictionary.

To be more specific, I would like to use RandomResize, RandomCrop and CenterCrop from references/segmentation/transforms.py but they do not seem to work with the target dictionary which contains 'boxes' and 'area' keys, which should also be modified after resizing the image.

What is the correct way to use these methods? How should I pass the target_dict or its elements to be modified accordingly?

Thanks for your feedback!

fmassa commented 5 years ago

@juanmed just follow the implementation in https://github.com/pytorch/vision/blob/master/references/detection/transforms.py and adapt it to your needs.

Note that Resize is part of the detection models now, and is currently present in https://github.com/pytorch/vision/blob/master/torchvision/models/detection/transform.py