pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
15.99k stars 6.92k forks source link

resize with pad transformation #6236

Open oekosheri opened 2 years ago

oekosheri commented 2 years ago

🚀 The feature

In tensorflow tf.image has a method, tf.image.resize_with_pad, that pads and resizes if the aspect ratio of input and output images are different to avoid distortion. I couldn't find an equivalent in torch transformations and had to write it myself. I think it would be a useful feature to have.

Motivation, pitch

When moving to pytoch from Tensorflow, one does not want to lose handy features!

Alternatives

No response

Additional context

No response

cc @vfdev-5 @datumbox

zhiqwang commented 2 years ago

Hi @oekosheri , you can check following function, it will do bottom-right padding mode

https://github.com/pytorch/vision/blob/d6e39ff76c82c7510f68a7aa637f015e7a86f217/torchvision/models/detection/transform.py#L25-L71

And I wrote a similar letterboxing mode at belows:

https://github.com/zhiqwang/yolov5-rt-stack/blob/main/yolort/models/transform.py#L65-L109

oekosheri commented 2 years ago

Hi @zhiqwang, Thanks! you mean I can use "TORCH.NN.FUNCTIONAL.INTERPOLATE" ? I tried it on a an image tensor now and it constantly gives value error of input/output size not matching. Also, this is pretty hidden. Why not add a simple wrapper to resize that does padding when aspect ratio can't be preserved?

zhiqwang commented 2 years ago

Also, this is pretty hidden. Why not add a simple wrapper to resize that does padding when aspect ratio can't be preserved?

Let's invite @datumbox to this disscusion, and hear his viewpoint on this problem.

zhiqwang commented 2 years ago

Just FYI, a previous issue #3286 also has some relevance to the discussion here.

datumbox commented 2 years ago

@oekosheri Thanks for the proposal.

I would like to understand more about the use-case. Why can't we just use the resize in combination with pad? It should be 2 relatively straightforward calls. Maintaining TorchVision is a balancing act between providing the necessary primitives for people to build upon it and avoid bloating the library. A good reason to add a functionality is if it's very popular or there are specific tricky corner-cases that need to be handled carefully. Is this the case here?

@zhiqwang I wouldn't recommend using the method from detection as it's private and might change on the near future. Though you are right to say that the specific detection transforms file does what @oekosheri wants to do (resize and then batch + pad), the code does too many things and is very coupled to the logic of Detection. We've started moving some of this logic at the references and on the near future we plan to start porting them in to main TorchVision. @vfdev-5 is currently working on the prototype transforms to finalize the API.

oekosheri commented 2 years ago

Hi @datumbox , imagine you have input images from different sources with different sizes and aspect ratios. You want to transform them all to one final size without distortion. If you separate out pad and resize, you need to manually apply different transforms to different images. However, when you have one transform applied to all inputs, in it you can check whether or not to pad and how to pad. An example code would sth like this:

import torchvision.transforms.functional as F

class Resize_with_pad:
    def __init__(self, w=1024, h=768):
        self.w = w
        self.h = h

    def __call__(self, image):

        w_1, h_1 = image.size
        ratio_f = self.w / self.h
        ratio_1 = w_1 / h_1

        # check if the original and final aspect ratios are the same within a margin
        if round(ratio_1, 2) != round(ratio_f, 2):

            # padding to preserve aspect ratio
            hp = int(w_1/ratio_f - h_1)
            wp = int(ratio_f * h_1 - w_1)
            if hp > 0 and wp < 0:
                hp = hp // 2
                image = F.pad(image, (0, hp, 0, hp), 0, "constant")
                return F.resize(image, [self.h, self.w])

            elif hp < 0 and wp > 0:
                wp = wp // 2
                image = F.pad(image, (wp, 0, wp, 0), 0, "constant")
                return F.resize(image, [self.h, self.w])

        else:
            return F.resize(image, [self.h, self.w])
datumbox commented 2 years ago

@oekosheri I understand this is strongly motivated for the Detection use-case where things need to be resized to a maximum size proportionally and then padded to ensure we can produce batches, right?

oekosheri commented 2 years ago

@datumbox They are padded to ensure images that have different original aspect ratio to the final one, don't get distorted. Distorted images may not work well with CNNs. I updated a mistake in the code above. As it is now, it produces the exact output that tf.image.resize_with_pad does.

datumbox commented 2 years ago

@oekosheri Thanks for the references and context. I'll sync with @vfdev-5 offline to see if we can add this on the new API and how. I'll leave the issue open to ensure it stays on our radar.

Inkorak commented 2 years ago

Yes, I also think that such transformation would be very useful. I also had cases when images of different resolutions and aspect ratios, but when cropping images, I could lose pieces important for classification (this was a classification of defects and they could be on the edge of the image) and I would like to maintain the aspect ratio in order to avoid strong distortions. So I had to use a combination of LongestMaxSize and PadIfNeeded from the Albumentations library. I would like something similar, you can implement it as suggested here in the form of one transformation.

H-Sorkatti commented 2 years ago

I Strongly second this feature. It is a very important transformation to have. I almost always have to use hacks to resolve this when working with images.

AsiaCao commented 1 year ago

ditto, it'd be really handy to have one. Our team also uses such a feature. We currently implemented a custom version with albumentations that only works for numpy (not torch tensor). And we are looking for an alternative that works with torch tensor and can be converted/embedded into an onnx graph via torch.onnx.export.

curious, do you think the torchvision team could implement it soon? @datumbox @zhiqwang

datumbox commented 1 year ago

@AsiaCao Thanks for the input. Right now we are focusing on finalizing the Transforms V2 API. Once we complete the work on that front, we can review this request and see what's the best way forwards.

AsiaCao commented 1 year ago

thanks @datumbox

swap-10 commented 1 year ago

Any plans for this to be implemented now? This would be convenient to have. Thanks!

amanikiruga commented 12 months ago

Any update to this?