Closed hjinlee88 closed 2 years ago
I think the different interpolation method used in between F.interpolate
and PIL.resize
are not consistent (even if you specify the same interpolation
parameter), so the inference results will be a little different.
@hjinlee88 thanks for the report. Currently, interpolation method is only visually similar for "nearest" option:
interpolation = 0
tensor_interpolate = transforms.Compose([transforms.ToTensor(), transforms.Resize(size, interpolation=interpolation), transforms.ToPILImage()])
pillow_resize = transforms.Compose([transforms.Resize(size, interpolation=interpolation)])
In general, as @zhiqwang said, F.interpolate
and PIL.resize
produce different results. Difference depends on the interpolation mode:
resized_tensor = [[a, a, b, c, d, d, e, ...]]
resized_pil_img = [[a, b, c, c, d, e, f, ...]]
@vfdev-5 I did a little research on this.
The easiest solution for downscale is to downsample as much as possible and then interpolate. Images are downsampled using convolution with normal (or uniform) weight and stride. For upscale, transpose convolution maybe works. Of course, upsample or downsample should not apply to NEAREST. Inspired by tfg.image.pyramid._upsample and tfg.image.pyramid._downsample.
Here is an example. The code is a bit dirty, but you can see what I am doing.
import urllib
from PIL import Image
from torchvision import transforms
from matplotlib import pyplot as plt
import torch
from torch import nn
from torch.nn import functional as F
class ResizeModify(nn.Module):
def __init__(self, size, interpolation):
super().__init__()
self.size = size
self.interpolation = interpolation
self.Resize = transforms.Resize(size, interpolation)
def forward(self, img):
img = img.unsqueeze(0)
h, w = img.shape[2:]
if isinstance(self.size, int) == 1:
if h > w:
h2, w2 = int(self.size / w * h), self.size
else:
h2, w2 = self.size, int(self.size / h * w)
else:
h2, w2 = self.size
if h2 > h: # upscale h
strides = int(h2 / h)
if strides > 1:
weights = torch.full((img.shape[1], 1, strides, 1), 1.0)
img = F.conv_transpose2d(img, weights, stride=(strides, 1), groups=img.shape[1])
else: # downscale h
strides = int(h / h2) # floor and int
if strides > 1:
# test with uniform weight, but normal (gaussian) weight will be better.
weights = torch.full((img.shape[1], 1, strides, 1), 1 / strides)
img = F.conv2d(img, weights, stride=(strides, 1), groups=img.shape[1])
if w2 > w: # upsacle w
strides = int(w2 / w)
if strides > 1:
weights = torch.full((img.shape[1], 1, 1, strides), 1.0)
img = F.conv_transpose2d(img, weights, stride=(1, strides), groups=img.shape[1])
else: # downscale w
strides = int(w / w2)
if strides > 1:
weights = torch.full((img.shape[1], 1, 1, strides), 1 / strides)
img = F.conv2d(img, weights, stride=(1, strides), groups=img.shape[1])
img = img.squeeze(0)
return self.Resize(img)
def main():
size = 112
img = Image.open(urllib.request.urlopen("https://pytorch.org/tutorials/_static/img/tv_tutorial/tv_image01.png"))
# img = Image.open("images/tv_image01.png")
interpolation = [Image.NEAREST, Image.BILINEAR, Image.BICUBIC][1]
tensor_interpolate = transforms.Compose(
[transforms.ToTensor(), transforms.Resize(size, interpolation=interpolation), transforms.ToPILImage()])
tensor_interpolate_modify = transforms.Compose(
[transforms.ToTensor(), ResizeModify(size, interpolation=interpolation), transforms.ToPILImage()])
pillow_resize = transforms.Compose([transforms.Resize(size, interpolation=interpolation)])
plt.subplot(221)
plt.imshow(img)
plt.title("original")
plt.subplot(222)
plt.imshow(pillow_resize(img))
plt.title("pillow resize")
plt.subplot(223)
plt.imshow(tensor_interpolate(img))
plt.title("tensor interpolate")
plt.subplot(224)
plt.imshow(tensor_interpolate_modify(img))
plt.title("tensor interpolate modify")
plt.show()
if __name__ == "__main__":
main()
I also found that tensor interpolate BICUBIC works weird.
@hjinlee88 thanks for investigating this. Results of resampling with convolutions look neat. Let me check and decide if what can be done with this issue.
EDIT:
With mode='bicubic', it’s possible to cause overshoot, in other words it can produce negative values or values greater than 255 for images. Explicitly call result.clamp(min=0, max=255) if you want to reduce the overshoot when displaying the image.
As in the code we are working on tensor input with dtype float32 we do not perform any clamping:
transforms.Compose([transforms.ToTensor(), transforms.Resize(size, interpolation=interpolation), transforms.ToPILImage()])
This should be fixed.
EDIT, EDIT: clamping float32 may lead to other unexpected results as we have no predefined range for float32 input vs 0-255 for uint8.
The issue is also with tensor_interpolate
which does the following conversion: path -> PIL[0-255] -> Tensor[0-1] -> Bicubic Resized Tensor[0-eps - 1 + eps] -> PIL[0-255] + artifacts
- tensor_interpolate = transforms.Compose([transforms.ToTensor(), transforms.Resize(size, interpolation=interpolation), transforms.ToPILImage()])
+ tensor_interpolate = transforms.Compose([
lambda x: torch.from_numpy(np.asarray(x)).permute(2, 0, 1),
transforms.Resize(size, interpolation=interpolation),
lambda x: Image.fromarray(x.permute(1, 2, 0).numpy()),
])
@vfdev-5 The below also works.
interpolation = Image.BICUBIC
tensor_interpolate = transforms.Compose([
transforms.ToTensor(),
transforms.Resize(size, interpolation=interpolation),
lambda x: x.clamp(0, 1),
transforms.ToPILImage()
])
The problem could be in this line https://github.com/pytorch/vision/blob/8088cc94f2155403f6b09cd54edadafa68daa977/torchvision/transforms/functional.py#L196-L197 from the following reasons:
print(torch.tensor([255, 256, 257]).byte())
tensor([255, 0, 1], dtype=torch.uint8)
I suggest below because mul(255) assume the pic is float and is in range [0, 1].
pic = pic.mul(255).clamp(0, 255).byte()
EDIT: add some explanation and suggestions.
@vfdev-5 I investigated the code I wrote earlier https://github.com/pytorch/vision/issues/2950#issuecomment-721513454.
I studied transpose convolution and found it useless here. Because, in here, this is just like copying the pixels closer together. Therefore, it must be removed.
conv2d with weights and strides looks good because it is essentially the same as blur (mean blur or gaussian blur) and downsampling. However, weights of Gaussian blur can be difficult to implement because the shape must be odd and you have to deal with sigma.
But I don't understand the behavior of interpolate2d. Is it intended to find the location of the target pixel in the source image, and find the pixel value using only the 4 points around the location?
@hjinlee88 interpolate in PyTorch implements interpolation following the standard approaches from OpenCV (for float values). For bilinear interpolation, each output value is computed as a weighted sum of 4 input pixels, which are determined via the input-output shapes.
For a python-based implementation of interpolate that gives the exact same result as torchvision, see https://gist.github.com/fmassa/cb2d0dff7731f6459d8ca5b5c9ea15d9 , in particular interpolate_dim
, which interpolates a tensor over a single dimension. So interpolate2d
can be seen as applying interpolate_dim
twice.
I took your example image and used instead OpenCV to perform bilinear interpolation, and the results from torchvision and OpenCV matched almost exactly, with just rounding differences leading to 1 (out of 255) pixel differences.
# same code as before that should be added here
import cv2
import numpy as np
imt = np.array(img)
tt = transforms.Resize(size, interpolation=interpolation)
# convert image as tensor without casting to 0-1
# and then resize it
res_tv = tt(torch.as_tensor(imt).permute(2, 0, 1)).permute(1, 2, 0).contiguous().numpy()
# apply bilinear resize from opencv
res_cv = cv2.resize(imt, (231, 112), interpolation=cv2.INTER_LINEAR)
# compute the difference
np.abs(res_tv.astype(float) - res_cv.astype(float)).max()
# > returns 1.0
@fmassa Thank you for the explanation. I find that OpenCV already knows that CV resize is not same as Pillow resize. https://github.com/opencv/opencv/pull/17068#issuecomment-613866220
I also tested resize bilinear in GIMP, the famous GNU graphics editor, and the result is different for both CV resize and Pillow resize. This result means that resize can be different for each program.
However, I think the current situation, that the output for the same input of the same class Resize is different depending on the type of input (torch Tensor or pillow Image), should be fixed.
I compared the results of the https://github.com/pytorch/vision/issues/2950#issuecomment-721513454 with the results of Pillow Resize, but the results were not the same. It's probably because the weights are different.
However, I think the current situation, that the output for the same input of the same class Resize is different depending on the type of input (torch Tensor or pillow Image), should be fixed
I agree that having a slightly different output for the function depending if it's using PIL Images or torch Tensors is not ideal, but fixing this would be complicated.
The trade-off here is that in C++, most users would rely on OpenCV or PyTorch to perform the resizing, so it would make sense for torchvision to be compatible with both. Plus we get the benefit that those ops are already implemented in C++ / CUDA with native batch support.
Cross-linking some discussion in https://twitter.com/ajlavin/status/1336131931314954240
Maybe it might be worth considering adding a new interpolation mode, akin to INTER_AREA
from OpenCV?
I posted this on the pytorch forums after banging my head yesterday cause my classifier trained with PIL image reading and served with torchvision.io.read_image
was not working at all (predicting completely flawed results). After digging, the issue comes from the Resize. I had already noticed this with opencv resize, and the solution was using INTER_AREA
.
But I would expect the new API to be compatible on serving that it produces similar output as the PIL (pre 0.8) implementation where most of models are trained on.
A similar example as the one posted above:
Another solution is to put a big warning on the docs to alert user to train and serve with the same image read/resize.
Resuscitating this thread: I just lost a few days chasing down a bug because we assumed the output of TF.resize
would be identical whether the input was a tensor or a PIL image: It seems that pillow prefilters before downsampling unlike pytorch.
While the difference is minimal for upsampling, it is quite huge for downsampling (see under).
Could we add a note in the documentation specifying that users should not expect the same behavior if downsampling depending on whether they pass an image or a tensor?
import torch
import torch.nn.functional as F
import torchvision.transforms.functional as TF
import PIL.Image as pil
img_pil = TF.to_pil_image(torch.randint(size=[3, 128, 128], low=0, high=255, dtype=torch.uint8))
img = TF.to_tensor(img_pil)
img_small_pil = TF.resize(img_pil, 64, interpolation=pil.BILINEAR)
img_small = TF.resize(img, 64, interpolation=pil.BILINEAR)
img_big_pil = TF.resize(img_pil, 256, interpolation=pil.BILINEAR)
img_big = TF.resize(img, 256, interpolation=pil.BILINEAR)
upsample_avg_error = torch.mean(torch.abs(TF.to_tensor(img_big_pil) - img_big)) * 255
downsample_avg_error = torch.mean(torch.abs(TF.to_tensor(img_small_pil) - img_small)) * 255
print(f"upsample_avg_error: {upsample_avg_error:0.2f}")
print(f"downsample_avg_error: {downsample_avg_error:0.2f}")
upsample_avg_error: 0.35
downsample_avg_error: 15.28
Hi @mrharicot , @tcapelle
Very sorry about the situation. We are working on adding a support for anti-aliasing for Tensor Transforms, so that they more closely match PIL.
cc @vfdev-5 who will be looking into addressing this
I wrote a summary of the issue here: https://tcapelle.github.io/pytorch/fastai/2021/02/26/image_resizing.html
This is insane: orig img: torch img: PIL img: Look at the white points on torch img!
I wrote a summary of the issue here: https://tcapelle.github.io/pytorch/fastai/2021/02/26/image_resizing.html
Your article saved my day!
@iynaur since version 0.10.0 we added antialias
option to produce similar results with tensors. Please, check out the following code:
import torch
import torchvision
print(torch.__version__, torchvision.__version__)
import matplotlib.pyplot as plt
import urllib
import torchvision.transforms.functional as f
from PIL import Image as Image
url = "https://user-images.githubusercontent.com/3275025/123925242-4c795b00-d9bd-11eb-9f0c-3c09a5204190.jpg"
img = Image.open(
urllib.request.urlopen(url)
)
t_img = f.to_tensor(img)
img_small_pil = f.resize(img, 128, interpolation=Image.BILINEAR)
img_small_aa = f.to_pil_image(f.resize(t_img, 128, interpolation=Image.BILINEAR, antialias=True))
img_small = f.to_pil_image(f.resize(t_img, 128, interpolation=Image.BILINEAR, antialias=False))
plt.figure(figsize=(3 * 8, 8))
plt.subplot(131)
plt.title("PIL")
plt.imshow(img_small_pil)
plt.subplot(132)
plt.title("Tensor with antialias")
plt.imshow(img_small_aa)
plt.subplot(133)
plt.title("Tensor without antialias")
plt.imshow(img_small)
> 1.9.0 0.10.0
@iynaur since version 0.10.0 we added
antialias
option to produce similar results with tensors. Please, check out the following code:
Thanks! Looks great. I will try when updated to that version.
@iynaur since version 0.10.0 we added
antialias
option to produce similar results with tensors. Please, check out the following code:import torch import torchvision print(torch.__version__, torchvision.__version__) import matplotlib.pyplot as plt import urllib import torchvision.transforms.functional as f from PIL import Image as Image url = "https://user-images.githubusercontent.com/3275025/123925242-4c795b00-d9bd-11eb-9f0c-3c09a5204190.jpg" img = Image.open( urllib.request.urlopen(url) ) t_img = f.to_tensor(img) img_small_pil = f.resize(img, 128, interpolation=Image.BILINEAR) img_small_aa = f.to_pil_image(f.resize(t_img, 128, interpolation=Image.BILINEAR, antialias=True)) img_small = f.to_pil_image(f.resize(t_img, 128, interpolation=Image.BILINEAR, antialias=False)) plt.figure(figsize=(3 * 8, 8)) plt.subplot(131) plt.title("PIL") plt.imshow(img_small_pil) plt.subplot(132) plt.title("Tensor with antialias") plt.imshow(img_small_aa) plt.subplot(133) plt.title("Tensor without antialias") plt.imshow(img_small) > 1.9.0 0.10.0
I will have to update my blog post with torchvision 0.10 then =)
@tcapelle that would be nice!
Note that the antialias
flag is for now in beta mode, and performance is not yet very competitive, but we will be optimizing it for the next release
@tcapelle that would be nice!
Done! https://tcapelle.github.io/pytorch/fastai/2021/02/26/image_resizing.html
@fmassa @vfdev-5 Thanks a lot, this looks great!
how can we use vision transforms resize on c++?
@bnascimento there is no C++ API for vision transforms, but you can use pytorch C++ API which can do a similar resizing: https://pytorch.org/cppdocs/api/function_namespacetorch_1_1nn_1_1functional_1afb8b9cd051ced01899b6d3142ac2f47c.html#exhale-function-namespacetorch-1-1nn-1-1functional-1afb8b9cd051ced01899b6d3142ac2f47c
HTH
Has this bug been fixed in 0.11 or 0.12?
It will be impossible to get exactly the same result for torch interpolate and PIL resize for all interpolation modes and scales. Results are compatible and almost equal. For example, here is a test that checks outputs: https://github.com/pytorch/vision/blob/59c4de9123eb1d39bb700f7ae7780fb9c7217910/test/test_functional_tensor.py#L548 We set tolerance to 8.0 while computing mean abs error, data is RGB uint8 ranges between [0-256]. I think we can close this issue.
@vfdev-5 I re-read this issue and found out that I accidentally put two bugs (anti-alias and bicubic overshoot) in one issue. Do I need to open a new issue to fix the bicubic overshoot or is it ok because there is a note in the docs?
@hjinlee88 I think with the newest and recommended way to do things there is no more both issues:
ToTensor()
is going to be deprecated. We suggest to use PILToTensor()
instead. WIth PILToTensor()
there is no range rescale as with ToTensor()
antialias=True
with Resize
So, your intial example would look like:
import urllib
from PIL import Image
from torchvision import transforms
from matplotlib import pyplot as plt
size = 112
img = Image.open(urllib.request.urlopen("https://pytorch.org/tutorials/_static/img/tv_tutorial/tv_image01.png"))
tensor_interpolate = transforms.Compose([
transforms.PILToTensor(),
transforms.Resize(size, interpolation=transforms.InterpolationMode.BICUBIC, antialias=True),
transforms.ToPILImage()
])
pillow_resize = transforms.Compose([
transforms.Resize(size, interpolation=transforms.InterpolationMode.BICUBIC)
])
plt.figure(figsize=(18, 18))
plt.subplot(121)
plt.imshow(tensor_interpolate(img))
plt.title("tensor interpolate")
plt.subplot(122)
plt.imshow(pillow_resize(img))
plt.title("pillow resize")
Please let me know if this is sufficient or there is still a problem. Thanks!
@vfdev-5 Thank you for the update. Has the performance issue with antialias=True with Resize been resolved?
@vfdev-5 Thank you for the update. Has the performance issue with antialias=True with Resize been resolved?
No, but it does make a small difference
By chance, it was discovered that even the newer version of Torch (I am 2.1.2) still has this issue in "bicubic" mode, which is that the resized image pixels will have significant unevenness in "bicubic" mode. The effect I tested: You can see black dots on the white car section (possibly in other inconspicuous positions) in the top right corner of torch. nn.functional. interpolate, which is clearly not smooth. Original image: PIL Resize to Half Width and Half Height Size: torch.nn.functional.interpolate to Half Width and Half Height Size: torch.nn.functional.interpolate with antialias=True to Half Width and Half Height Size:
So are there any bugs that need to be fixed in "bicubic" mode ?
my test code is here, just change "img_dir" to one dir that with '.jpg' or '.png' in dir, you can test this idea quickly. pil_torch_rsz.py.txt
The effect I tested: You can see black dots on the white car section (possibly in other inconspicuous positions) in the top right corner of torch. nn.functional. interpolate, which is clearly not smooth.
@lzcchl this is an expected behaviour as you resize a float tensor and cast to uint8 without clamping the range into [0, 255]. See the first note in the docs: https://pytorch.org/docs/stable/generated/torch.nn.functional.interpolate.html#torch.nn.functional.interpolate
With mode='bicubic', it’s possible to cause overshoot, in other words it can produce negative values or values greater than 255 for images. Explicitly call result.clamp(min=0, max=255) if you want to reduce the overshoot when displaying the image.
Here are two options that could remove the black dots: 1) use float dtype and clamp
resized_tensor = torch.nn.functional.interpolate(
tensor.float(),
size=out_size,
mode='bicubic',
antialias=True
)
resized_tensor = resized_tensor.clamp(0, 255)
resized_tensor = resized_tensor.to(tensor.dtype)
2) use uint8 dtype directly
resized_tensor = torch.nn.functional.interpolate(
tensor,
size=out_size,
mode='bicubic',
antialias=True
)
Let me know if I misunderstood something from your comment.
Thank you, this is a great job and you have basically solved my doubts.
In my code, I also compared the results of pil, torch, and torchvision, but that's not the focus of my question because I know there are slight differences in their implementation methods, which can lead to differences in digital results, which is understandable.
So in my future work, I will use your first suggestion because handling floats is very common.
🐛 Bug
Resize supports tensors by F.interpolate, but the behavior is not the same as Pillow resize. https://github.com/pytorch/vision/blob/f95b0533243dfbc901b5ed5f5db28a5a46bdb699/torchvision/transforms/functional.py#L309-L312
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Both should have the same or nearly identical output. Perhaps, it needs blur before interpolate.
Environment
I installed pytorch using the following command: conda install pytorch torchvision -c pytorch
python collect_env.py Collecting environment information... PyTorch version: 1.7.0 Is debug build: True CUDA used to build PyTorch: 11.0 ROCM used to build PyTorch: N/A
OS: Microsoft Windows 10 Home GCC version: (MinGW.org GCC-8.2.0-3) 8.2.0 Clang version: Could not collect CMake version: version 3.18.2
Python version: 3.8 (64-bit runtime) Is CUDA available: True CUDA runtime version: 10.0.130 GPU models and configuration: GPU 0: GeForce RTX 2060 Nvidia driver version: 456.38 cuDNN version: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\cudnn64_7.dll HIP runtime version: N/A MIOpen runtime version: N/A
Versions of relevant libraries: [pip3] numpy==1.19.2 [pip3] torch==1.7.0 [pip3] torchvision==0.8.1 [conda] blas 1.0 mkl [conda] cudatoolkit 11.0.221 h74a9793_0 [conda] mkl 2020.2 256 [conda] mkl-service 2.3.0 py38hb782905_0 [conda] mkl_fft 1.2.0 py38h45dec08_0 [conda] mkl_random 1.1.1 py38h47e9c7a_0 [conda] numpy 1.19.2 py38hadc3359_0 [conda] numpy-base 1.19.2 py38ha3acd2a_0 [conda] pytorch 1.7.0 py3.8_cuda110_cudnn8_0 pytorch [conda] torchvision 0.8.1 py38_cu110 pytorch
cc @vfdev-5