pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
16.15k stars 6.94k forks source link

In transforms.Resize, tensor interpolate is not the same as PIL resize. #2950

Closed hjinlee88 closed 2 years ago

hjinlee88 commented 3 years ago

🐛 Bug

Resize supports tensors by F.interpolate, but the behavior is not the same as Pillow resize. https://github.com/pytorch/vision/blob/f95b0533243dfbc901b5ed5f5db28a5a46bdb699/torchvision/transforms/functional.py#L309-L312

To Reproduce

Steps to reproduce the behavior:

import urllib
from PIL import Image
from torchvision import transforms
from matplotlib import pyplot as plt

size = 112
img = Image.open(urllib.request.urlopen("https://pytorch.org/tutorials/_static/img/tv_tutorial/tv_image01.png"))

tensor_interpolate = transforms.Compose([transforms.ToTensor(), transforms.Resize(size), transforms.ToPILImage()])
pillow_resize = transforms.Compose([transforms.Resize(size)])

plt.subplot(311)
plt.imshow(img)
plt.title("original")
plt.subplot(312)
plt.imshow(tensor_interpolate(img))
plt.title("tensor interpolate")
plt.subplot(313)
plt.imshow(pillow_resize(img))
plt.title("pillow resize")
plt.show()

image

Expected behavior

Both should have the same or nearly identical output. Perhaps, it needs blur before interpolate.

Environment

I installed pytorch using the following command: conda install pytorch torchvision -c pytorch

python collect_env.py Collecting environment information... PyTorch version: 1.7.0 Is debug build: True CUDA used to build PyTorch: 11.0 ROCM used to build PyTorch: N/A

OS: Microsoft Windows 10 Home GCC version: (MinGW.org GCC-8.2.0-3) 8.2.0 Clang version: Could not collect CMake version: version 3.18.2

Python version: 3.8 (64-bit runtime) Is CUDA available: True CUDA runtime version: 10.0.130 GPU models and configuration: GPU 0: GeForce RTX 2060 Nvidia driver version: 456.38 cuDNN version: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\cudnn64_7.dll HIP runtime version: N/A MIOpen runtime version: N/A

Versions of relevant libraries: [pip3] numpy==1.19.2 [pip3] torch==1.7.0 [pip3] torchvision==0.8.1 [conda] blas 1.0 mkl [conda] cudatoolkit 11.0.221 h74a9793_0 [conda] mkl 2020.2 256 [conda] mkl-service 2.3.0 py38hb782905_0 [conda] mkl_fft 1.2.0 py38h45dec08_0 [conda] mkl_random 1.1.1 py38h47e9c7a_0 [conda] numpy 1.19.2 py38hadc3359_0 [conda] numpy-base 1.19.2 py38ha3acd2a_0 [conda] pytorch 1.7.0 py3.8_cuda110_cudnn8_0 pytorch [conda] torchvision 0.8.1 py38_cu110 pytorch

cc @vfdev-5

zhiqwang commented 3 years ago

I think the different interpolation method used in between F.interpolate and PIL.resize are not consistent (even if you specify the same interpolation parameter), so the inference results will be a little different.

vfdev-5 commented 3 years ago

@hjinlee88 thanks for the report. Currently, interpolation method is only visually similar for "nearest" option:

interpolation = 0
tensor_interpolate = transforms.Compose([transforms.ToTensor(), transforms.Resize(size, interpolation=interpolation), transforms.ToPILImage()])
pillow_resize = transforms.Compose([transforms.Resize(size, interpolation=interpolation)])

image

In general, as @zhiqwang said, F.interpolate and PIL.resize produce different results. Difference depends on the interpolation mode:

hjinlee88 commented 3 years ago

@vfdev-5 I did a little research on this.

The easiest solution for downscale is to downsample as much as possible and then interpolate. Images are downsampled using convolution with normal (or uniform) weight and stride. For upscale, transpose convolution maybe works. Of course, upsample or downsample should not apply to NEAREST. Inspired by tfg.image.pyramid._upsample and tfg.image.pyramid._downsample.

Here is an example. The code is a bit dirty, but you can see what I am doing. bilinear

import urllib
from PIL import Image
from torchvision import transforms
from matplotlib import pyplot as plt
import torch
from torch import nn
from torch.nn import functional as F

class ResizeModify(nn.Module):
    def __init__(self, size, interpolation):
        super().__init__()
        self.size = size
        self.interpolation = interpolation
        self.Resize = transforms.Resize(size, interpolation)

    def forward(self, img):
        img = img.unsqueeze(0)
        h, w = img.shape[2:]

        if isinstance(self.size, int) == 1:
            if h > w:
                h2, w2 = int(self.size / w * h), self.size
            else:
                h2, w2 = self.size, int(self.size / h * w)
        else:
            h2, w2 = self.size

        if h2 > h:  # upscale h
            strides = int(h2 / h)
            if strides > 1:
                weights = torch.full((img.shape[1], 1, strides, 1), 1.0)
                img = F.conv_transpose2d(img, weights, stride=(strides, 1), groups=img.shape[1])
        else:   # downscale h
            strides = int(h / h2)  # floor and int
            if strides > 1:
                # test with uniform weight, but normal (gaussian) weight will be better.
                weights = torch.full((img.shape[1], 1, strides, 1), 1 / strides)
                img = F.conv2d(img, weights, stride=(strides, 1), groups=img.shape[1])

        if w2 > w:  # upsacle w
            strides = int(w2 / w)
            if strides > 1:
                weights = torch.full((img.shape[1], 1, 1, strides), 1.0)
                img = F.conv_transpose2d(img, weights, stride=(1, strides), groups=img.shape[1])
        else:   # downscale w
            strides = int(w / w2)
            if strides > 1:
                weights = torch.full((img.shape[1], 1, 1, strides), 1 / strides)
                img = F.conv2d(img, weights, stride=(1, strides), groups=img.shape[1])

        img = img.squeeze(0)
        return self.Resize(img)

def main():
    size = 112
    img = Image.open(urllib.request.urlopen("https://pytorch.org/tutorials/_static/img/tv_tutorial/tv_image01.png"))
    # img = Image.open("images/tv_image01.png")

    interpolation = [Image.NEAREST, Image.BILINEAR, Image.BICUBIC][1]
    tensor_interpolate = transforms.Compose(
        [transforms.ToTensor(), transforms.Resize(size, interpolation=interpolation), transforms.ToPILImage()])
    tensor_interpolate_modify = transforms.Compose(
        [transforms.ToTensor(), ResizeModify(size, interpolation=interpolation), transforms.ToPILImage()])
    pillow_resize = transforms.Compose([transforms.Resize(size, interpolation=interpolation)])

    plt.subplot(221)
    plt.imshow(img)
    plt.title("original")
    plt.subplot(222)
    plt.imshow(pillow_resize(img))
    plt.title("pillow resize")
    plt.subplot(223)
    plt.imshow(tensor_interpolate(img))
    plt.title("tensor interpolate")
    plt.subplot(224)
    plt.imshow(tensor_interpolate_modify(img))
    plt.title("tensor interpolate modify")

    plt.show()

if __name__ == "__main__":
    main()

I also found that tensor interpolate BICUBIC works weird. bicubic

vfdev-5 commented 3 years ago

@hjinlee88 thanks for investigating this. Results of resampling with convolutions look neat. Let me check and decide if what can be done with this issue.

EDIT:

With mode='bicubic', it’s possible to cause overshoot, in other words it can produce negative values or values greater than 255 for images. Explicitly call result.clamp(min=0, max=255) if you want to reduce the overshoot when displaying the image.

https://pytorch.org/docs/stable/nn.functional.html?highlight=interpolate#torch.nn.functional.interpolate

As in the code we are working on tensor input with dtype float32 we do not perform any clamping:

transforms.Compose([transforms.ToTensor(), transforms.Resize(size, interpolation=interpolation), transforms.ToPILImage()])

This should be fixed.

EDIT, EDIT: clamping float32 may lead to other unexpected results as we have no predefined range for float32 input vs 0-255 for uint8.

The issue is also with tensor_interpolate which does the following conversion: path -> PIL[0-255] -> Tensor[0-1] -> Bicubic Resized Tensor[0-eps - 1 + eps] -> PIL[0-255] + artifacts

- tensor_interpolate = transforms.Compose([transforms.ToTensor(), transforms.Resize(size, interpolation=interpolation), transforms.ToPILImage()])
+ tensor_interpolate = transforms.Compose([
    lambda x: torch.from_numpy(np.asarray(x)).permute(2, 0, 1),
    transforms.Resize(size, interpolation=interpolation),
    lambda x: Image.fromarray(x.permute(1, 2, 0).numpy()), 
])
hjinlee88 commented 3 years ago

@vfdev-5 The below also works.

interpolation = Image.BICUBIC
tensor_interpolate = transforms.Compose([
    transforms.ToTensor(),
    transforms.Resize(size, interpolation=interpolation),
    lambda x: x.clamp(0, 1),
    transforms.ToPILImage()
])

The problem could be in this line https://github.com/pytorch/vision/blob/8088cc94f2155403f6b09cd54edadafa68daa977/torchvision/transforms/functional.py#L196-L197 from the following reasons:

print(torch.tensor([255, 256, 257]).byte()) 
tensor([255,   0,   1], dtype=torch.uint8)

I suggest below because mul(255) assume the pic is float and is in range [0, 1].

pic = pic.mul(255).clamp(0, 255).byte() 

EDIT: add some explanation and suggestions.

hjinlee88 commented 3 years ago

@vfdev-5 I investigated the code I wrote earlier https://github.com/pytorch/vision/issues/2950#issuecomment-721513454.

I studied transpose convolution and found it useless here. Because, in here, this is just like copying the pixels closer together. Therefore, it must be removed.

conv2d with weights and strides looks good because it is essentially the same as blur (mean blur or gaussian blur) and downsampling. However, weights of Gaussian blur can be difficult to implement because the shape must be odd and you have to deal with sigma.

But I don't understand the behavior of interpolate2d. Is it intended to find the location of the target pixel in the source image, and find the pixel value using only the 4 points around the location?

fmassa commented 3 years ago

@hjinlee88 interpolate in PyTorch implements interpolation following the standard approaches from OpenCV (for float values). For bilinear interpolation, each output value is computed as a weighted sum of 4 input pixels, which are determined via the input-output shapes.

For a python-based implementation of interpolate that gives the exact same result as torchvision, see https://gist.github.com/fmassa/cb2d0dff7731f6459d8ca5b5c9ea15d9 , in particular interpolate_dim, which interpolates a tensor over a single dimension. So interpolate2d can be seen as applying interpolate_dim twice.

I took your example image and used instead OpenCV to perform bilinear interpolation, and the results from torchvision and OpenCV matched almost exactly, with just rounding differences leading to 1 (out of 255) pixel differences.

# same code as before that should be added here

import cv2
import numpy as np

imt = np.array(img)

tt = transforms.Resize(size, interpolation=interpolation)

# convert image as tensor without casting to 0-1
# and then resize it
res_tv = tt(torch.as_tensor(imt).permute(2, 0, 1)).permute(1, 2, 0).contiguous().numpy()

# apply bilinear resize from opencv
res_cv = cv2.resize(imt, (231, 112), interpolation=cv2.INTER_LINEAR)

# compute the difference
np.abs(res_tv.astype(float) - res_cv.astype(float)).max()

# > returns 1.0
hjinlee88 commented 3 years ago

@fmassa Thank you for the explanation. I find that OpenCV already knows that CV resize is not same as Pillow resize. https://github.com/opencv/opencv/pull/17068#issuecomment-613866220

I also tested resize bilinear in GIMP, the famous GNU graphics editor, and the result is different for both CV resize and Pillow resize. This result means that resize can be different for each program.

However, I think the current situation, that the output for the same input of the same class Resize is different depending on the type of input (torch Tensor or pillow Image), should be fixed.

I compared the results of the https://github.com/pytorch/vision/issues/2950#issuecomment-721513454 with the results of Pillow Resize, but the results were not the same. It's probably because the weights are different.

fmassa commented 3 years ago

However, I think the current situation, that the output for the same input of the same class Resize is different depending on the type of input (torch Tensor or pillow Image), should be fixed

I agree that having a slightly different output for the function depending if it's using PIL Images or torch Tensors is not ideal, but fixing this would be complicated.

The trade-off here is that in C++, most users would rely on OpenCV or PyTorch to perform the resizing, so it would make sense for torchvision to be compatible with both. Plus we get the benefit that those ops are already implemented in C++ / CUDA with native batch support.

fmassa commented 3 years ago

Cross-linking some discussion in https://twitter.com/ajlavin/status/1336131931314954240

Maybe it might be worth considering adding a new interpolation mode, akin to INTER_AREA from OpenCV?

tcapelle commented 3 years ago

I posted this on the pytorch forums after banging my head yesterday cause my classifier trained with PIL image reading and served with torchvision.io.read_image was not working at all (predicting completely flawed results). After digging, the issue comes from the Resize. I had already noticed this with opencv resize, and the solution was using INTER_AREA. But I would expect the new API to be compatible on serving that it produces similar output as the PIL (pre 0.8) implementation where most of models are trained on. A similar example as the one posted above: image Another solution is to put a big warning on the docs to alert user to train and serve with the same image read/resize.

mrharicot commented 3 years ago

Resuscitating this thread: I just lost a few days chasing down a bug because we assumed the output of TF.resize would be identical whether the input was a tensor or a PIL image: It seems that pillow prefilters before downsampling unlike pytorch. While the difference is minimal for upsampling, it is quite huge for downsampling (see under).

Could we add a note in the documentation specifying that users should not expect the same behavior if downsampling depending on whether they pass an image or a tensor?

import torch
import torch.nn.functional as F
import torchvision.transforms.functional as TF
import PIL.Image as pil

img_pil = TF.to_pil_image(torch.randint(size=[3, 128, 128], low=0, high=255, dtype=torch.uint8))
img = TF.to_tensor(img_pil)

img_small_pil = TF.resize(img_pil, 64, interpolation=pil.BILINEAR)
img_small = TF.resize(img, 64, interpolation=pil.BILINEAR)

img_big_pil = TF.resize(img_pil, 256, interpolation=pil.BILINEAR)
img_big = TF.resize(img, 256, interpolation=pil.BILINEAR)

upsample_avg_error = torch.mean(torch.abs(TF.to_tensor(img_big_pil) - img_big)) * 255
downsample_avg_error = torch.mean(torch.abs(TF.to_tensor(img_small_pil) - img_small)) * 255

print(f"upsample_avg_error: {upsample_avg_error:0.2f}")
print(f"downsample_avg_error: {downsample_avg_error:0.2f}")
upsample_avg_error: 0.35
downsample_avg_error: 15.28
fmassa commented 3 years ago

Hi @mrharicot , @tcapelle

Very sorry about the situation. We are working on adding a support for anti-aliasing for Tensor Transforms, so that they more closely match PIL.

fmassa commented 3 years ago

cc @vfdev-5 who will be looking into addressing this

tcapelle commented 3 years ago

I wrote a summary of the issue here: https://tcapelle.github.io/pytorch/fastai/2021/02/26/image_resizing.html

iynaur commented 3 years ago

This is insane: orig img: 1624942477_454 torch img: torch PIL img: PIL Look at the white points on torch img!

iynaur commented 3 years ago

I wrote a summary of the issue here: https://tcapelle.github.io/pytorch/fastai/2021/02/26/image_resizing.html

Your article saved my day!

vfdev-5 commented 3 years ago

@iynaur since version 0.10.0 we added antialias option to produce similar results with tensors. Please, check out the following code:

import torch
import torchvision
print(torch.__version__, torchvision.__version__)
import matplotlib.pyplot as plt

import urllib
import torchvision.transforms.functional as f
from PIL import Image as Image

url = "https://user-images.githubusercontent.com/3275025/123925242-4c795b00-d9bd-11eb-9f0c-3c09a5204190.jpg"
img = Image.open(
    urllib.request.urlopen(url)
)

t_img = f.to_tensor(img)

img_small_pil = f.resize(img, 128, interpolation=Image.BILINEAR)
img_small_aa = f.to_pil_image(f.resize(t_img, 128, interpolation=Image.BILINEAR, antialias=True))
img_small = f.to_pil_image(f.resize(t_img, 128, interpolation=Image.BILINEAR, antialias=False))

plt.figure(figsize=(3 * 8, 8))
plt.subplot(131)
plt.title("PIL")
plt.imshow(img_small_pil)
plt.subplot(132)
plt.title("Tensor with antialias")
plt.imshow(img_small_aa)
plt.subplot(133)
plt.title("Tensor without antialias")
plt.imshow(img_small)

> 1.9.0 0.10.0

image

iynaur commented 3 years ago

@iynaur since version 0.10.0 we added antialias option to produce similar results with tensors. Please, check out the following code:

Thanks! Looks great. I will try when updated to that version.

tcapelle commented 3 years ago

@iynaur since version 0.10.0 we added antialias option to produce similar results with tensors. Please, check out the following code:

import torch
import torchvision
print(torch.__version__, torchvision.__version__)
import matplotlib.pyplot as plt

import urllib
import torchvision.transforms.functional as f
from PIL import Image as Image

url = "https://user-images.githubusercontent.com/3275025/123925242-4c795b00-d9bd-11eb-9f0c-3c09a5204190.jpg"
img = Image.open(
    urllib.request.urlopen(url)
)

t_img = f.to_tensor(img)

img_small_pil = f.resize(img, 128, interpolation=Image.BILINEAR)
img_small_aa = f.to_pil_image(f.resize(t_img, 128, interpolation=Image.BILINEAR, antialias=True))
img_small = f.to_pil_image(f.resize(t_img, 128, interpolation=Image.BILINEAR, antialias=False))

plt.figure(figsize=(3 * 8, 8))
plt.subplot(131)
plt.title("PIL")
plt.imshow(img_small_pil)
plt.subplot(132)
plt.title("Tensor with antialias")
plt.imshow(img_small_aa)
plt.subplot(133)
plt.title("Tensor without antialias")
plt.imshow(img_small)

> 1.9.0 0.10.0

image

I will have to update my blog post with torchvision 0.10 then =)

fmassa commented 3 years ago

@tcapelle that would be nice!

Note that the antialias flag is for now in beta mode, and performance is not yet very competitive, but we will be optimizing it for the next release

tcapelle commented 3 years ago

@tcapelle that would be nice!

Done! https://tcapelle.github.io/pytorch/fastai/2021/02/26/image_resizing.html

mrharicot commented 3 years ago

@fmassa @vfdev-5 Thanks a lot, this looks great!

bnascimento commented 2 years ago

how can we use vision transforms resize on c++?

vfdev-5 commented 2 years ago

@bnascimento there is no C++ API for vision transforms, but you can use pytorch C++ API which can do a similar resizing: https://pytorch.org/cppdocs/api/function_namespacetorch_1_1nn_1_1functional_1afb8b9cd051ced01899b6d3142ac2f47c.html#exhale-function-namespacetorch-1-1nn-1-1functional-1afb8b9cd051ced01899b6d3142ac2f47c

HTH

zhanwenchen commented 2 years ago

Has this bug been fixed in 0.11 or 0.12?

vfdev-5 commented 2 years ago

It will be impossible to get exactly the same result for torch interpolate and PIL resize for all interpolation modes and scales. Results are compatible and almost equal. For example, here is a test that checks outputs: https://github.com/pytorch/vision/blob/59c4de9123eb1d39bb700f7ae7780fb9c7217910/test/test_functional_tensor.py#L548 We set tolerance to 8.0 while computing mean abs error, data is RGB uint8 ranges between [0-256]. I think we can close this issue.

hjinlee88 commented 2 years ago

@vfdev-5 I re-read this issue and found out that I accidentally put two bugs (anti-alias and bicubic overshoot) in one issue. Do I need to open a new issue to fix the bicubic overshoot or is it ok because there is a note in the docs?

vfdev-5 commented 2 years ago

@hjinlee88 I think with the newest and recommended way to do things there is no more both issues:

So, your intial example would look like:

import urllib
from PIL import Image
from torchvision import transforms
from matplotlib import pyplot as plt

size = 112
img = Image.open(urllib.request.urlopen("https://pytorch.org/tutorials/_static/img/tv_tutorial/tv_image01.png"))
tensor_interpolate = transforms.Compose([
    transforms.PILToTensor(), 
    transforms.Resize(size, interpolation=transforms.InterpolationMode.BICUBIC, antialias=True), 
    transforms.ToPILImage()
])
pillow_resize = transforms.Compose([
    transforms.Resize(size, interpolation=transforms.InterpolationMode.BICUBIC)
])

plt.figure(figsize=(18, 18))
plt.subplot(121)
plt.imshow(tensor_interpolate(img))
plt.title("tensor interpolate")
plt.subplot(122)
plt.imshow(pillow_resize(img))
plt.title("pillow resize")

image

Please let me know if this is sufficient or there is still a problem. Thanks!

zhanwenchen commented 2 years ago

@vfdev-5 Thank you for the update. Has the performance issue with antialias=True with Resize been resolved?

parkkyungjun commented 7 months ago

@vfdev-5 Thank you for the update. Has the performance issue with antialias=True with Resize been resolved?

No, but it does make a small difference

lzcchl commented 4 months ago

By chance, it was discovered that even the newer version of Torch (I am 2.1.2) still has this issue in "bicubic" mode, which is that the resized image pixels will have significant unevenness in "bicubic" mode. The effect I tested: You can see black dots on the white car section (possibly in other inconspicuous positions) in the top right corner of torch. nn.functional. interpolate, which is clearly not smooth. Original image: dog PIL Resize to Half Width and Half Height Size: pil torch.nn.functional.interpolate to Half Width and Half Height Size: th2 torch.nn.functional.interpolate with antialias=True to Half Width and Half Height Size: th1

So are there any bugs that need to be fixed in "bicubic" mode ?

my test code is here, just change "img_dir" to one dir that with '.jpg' or '.png' in dir, you can test this idea quickly. pil_torch_rsz.py.txt

vfdev-5 commented 2 months ago

The effect I tested: You can see black dots on the white car section (possibly in other inconspicuous positions) in the top right corner of torch. nn.functional. interpolate, which is clearly not smooth.

@lzcchl this is an expected behaviour as you resize a float tensor and cast to uint8 without clamping the range into [0, 255]. See the first note in the docs: https://pytorch.org/docs/stable/generated/torch.nn.functional.interpolate.html#torch.nn.functional.interpolate

With mode='bicubic', it’s possible to cause overshoot, in other words it can produce negative values or values greater than 255 for images. Explicitly call result.clamp(min=0, max=255) if you want to reduce the overshoot when displaying the image.

Here are two options that could remove the black dots: 1) use float dtype and clamp

resized_tensor = torch.nn.functional.interpolate(
    tensor.float(),
    size=out_size,
    mode='bicubic',
    antialias=True
)
resized_tensor = resized_tensor.clamp(0, 255)
resized_tensor = resized_tensor.to(tensor.dtype)

2) use uint8 dtype directly

resized_tensor = torch.nn.functional.interpolate(
    tensor,
    size=out_size,
    mode='bicubic',
    antialias=True
)

Let me know if I misunderstood something from your comment.

lzcchl commented 2 months ago

Thank you, this is a great job and you have basically solved my doubts.

In my code, I also compared the results of pil, torch, and torchvision, but that's not the focus of my question because I know there are slight differences in their implementation methods, which can lead to differences in digital results, which is understandable.

So in my future work, I will use your first suggestion because handling floats is very common.