pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
16.19k stars 6.95k forks source link

`affine` creates artefacts on the edges of the image #8083

Open antoinebrl opened 1 year ago

antoinebrl commented 1 year ago

🐛 Describe the bug

When employing the affine functional operation (in both v1 and v2), it's evident that black borders are introduced around the image, even when the fill value matches the image content. These black margins are observable when using both uint8 and float32 data types, and this phenomenon occurs consistently on both Ubuntu and Mac M1.

Upon comparing the implementation of the 'affine' operation in torchvision with that in Kornia, I am uncertain whether the interpolation issue is limited to the image edges. Notably, when utilizing Kornia, the output appears to be more visually appealing when applied to an image.

import torch
import torchvision
from torchvision.transforms.v2.functional import affine
from torchvision.tv_tensors import Image
from torchvision.transforms.v2.functional._geometry import _get_inverse_affine_matrix
from kornia.geometry.transform import get_affine_matrix2d, warp_affine
from torchvision.transforms import InterpolationMode

image = Image(128 * torch.ones((3, 240, 200), dtype=torch.float))

angle =30
trans = (0,0)
scale = 1.0
shear = (0,0)
center = (image.shape[-1] / 2, image.shape[-2] / 2)
inter = InterpolationMode.BILINEAR
fill = [128, 128, 128]

M = get_affine_matrix2d(
    torch.Tensor(trans),
    torch.Tensor([center]),
    torch.Tensor([[scale, scale]]),
    torch.Tensor([angle]),
    torch.Tensor([shear[0]]),
    torch.Tensor([shear[1]]),
)

kn_img = warp_affine(
    image.unsqueeze(0),
    M[:, :2],
    image.shape[-2:],
    mode="bilinear",
    padding_mode="fill",
    fill_value=torch.tensor(fill),
    align_corners=False,
)

tv_img = affine(
    image,
    angle=angle,
    translate=trans,
    scale=scale,
    shear=shear,
    fill=fill,
    interpolation=inter,
)

torchvision.io.write_png(kn_img[0].to(dtype=torch.uint8), "affine_kornia.png")
torchvision.io.write_png(tv_img.to(dtype=torch.uint8), "affine_torchvision.png")
Kornia Torchvision
affine_kornia affine_torchvision

Versions

PyTorch version: 2.1.0 Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A

OS: macOS 13.6 (arm64) GCC version: Could not collect Clang version: 15.0.0 (clang-1500.0.40.1) CMake version: Could not collect Libc version: N/A

Python version: 3.10.9 (main, Jun 29 2023, 12:23:23) [Clang 14.0.3 (clang-1403.0.22.14.1)] (64-bit runtime) Python platform: macOS-13.6-arm64-arm-64bit Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Apple M1 Pro

Versions of relevant libraries: [pip3] flake8==6.1.0 [pip3] mypy==1.6.0 [pip3] mypy-extensions==1.0.0 [pip3] numpy==1.25.2 [pip3] onnx==1.14.1 [pip3] pytorch-lightning==2.0.9 [pip3] pytorch-ranger==0.1.1 [pip3] torch==2.1.0 [pip3] torch-optimizer==0.3.0 [pip3] torchdata==0.6.1 [pip3] torchmetrics==1.0.3 [pip3] torchtext==0.15.2 [pip3] torchvision==0.16.0

cc @vfdev-5

pmeier commented 1 year ago

This is #6517

antoinebrl commented 1 year ago

Thanks @pmeier! If i understand well, the issue is restricted to the edge of the image and the rest of the grid sampling/interpolation works correctly. Is that right?

pmeier commented 12 months ago

If i understand well, the issue is restricted to the edge of the image and the rest of the grid sampling/interpolation works correctly. Is that right?

Correct. And to be even more specific, it only happens for bilinear interpolation:

https://github.com/pytorch/vision/blob/f69eee6108cd047ac8b62a2992244e9ab3c105e1/torchvision/transforms/_functional_tensor.py#L567-L571

It is the blending between the fill mask and the image that goes wrong.

pmeier commented 12 months ago

Here is adapted script to show the difference between nearest and bilinear:

import torch
import torchvision
from torchvision.transforms.v2.functional import affine
from torchvision.tv_tensors import Image
from torchvision.transforms import InterpolationMode

image = Image(128 * torch.ones((3, 240, 200), dtype=torch.float))

angle = 30
trans = (0,0)
scale = 1.0
shear = (0,0)
center = (image.shape[-1] / 2, image.shape[-2] / 2)
fill = [128, 128, 128]

for inter in [InterpolationMode.NEAREST, InterpolationMode.BILINEAR]:
    tv_img = affine(
        image,
        angle=angle,
        translate=trans,
        scale=scale,
        shear=shear,
        fill=fill,
        interpolation=inter,
    )
    torchvision.io.write_png(tv_img.to(dtype=torch.uint8), f"{inter.value}.png")
amanikiruga commented 2 months ago

this hasn't been fixed yet