pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
15.99k stars 6.92k forks source link

RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension for v2 transforms #8622

Open lxr2 opened 1 week ago

lxr2 commented 1 week ago

🐛 Describe the bug

It seems that v2.Pad does not support cases where the padding size is greater than the image size, but v1.Pad does support this. I hope that v2.Pad will allow this in the future as well.

from torchvision.transforms import v2
import torchvision.transforms as T
from torchvision.transforms import functional as F

orig_img = torch.rand([3,32,32])
orig_img = F.to_pil_image(orig_img)

# Not supported
trans_img = v2.Compose([v2.ToImage(), T.Pad(padding=36, padding_mode='reflect')])(orig_img)

# Supported
trans_img = T.Compose([T.Pad(padding=36, padding_mode='reflect')])(orig_img)

Versions

Collecting environment information...
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.35

Python version: 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0] (64-bit runtime)
Python platform: Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3060 Laptop GPU
Nvidia driver version: 546.80
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
...
[conda] torch                     2.4.0                    pypi_0    pypi
[conda] torchmetrics              1.4.0.post0              pypi_0    pypi
[conda] torchvision               0.19.0                   pypi_0    pypi
[conda] triton                    3.0.0                    pypi_0    pypi
venkatram-dev commented 1 week ago

Not sure of the reason to combine v1 and v2 together in v2.Compose([v2.ToImage(), T.Pad(padding=36,

T.Pad(

Below code works (tested in google colab) . Please try this.


from torchvision.transforms import v2 as T2
import torchvision.transforms.functional as F
import torch

orig_img = torch.rand([3,32,32])
orig_img = F.to_pil_image(orig_img)

# Using v2 API for padding
transform = T2.Compose([
    T2.Pad(padding=36, padding_mode='reflect'),  # Use v2.Pad directly
    #T2.ToTensor()
])

transform
# Apply transformation
trans_img = transform(orig_img)
trans_img
lxr2 commented 1 week ago

It works, but following the docs, it seems that the standard steps should include v2.ToImage() if the img is PIL format. I am confused about it.


This is what a typical transform pipeline could look like:

from torchvision.transforms import v2
transforms = v2.Compose([
    v2.ToImage(),  # Convert to tensor, only needed if you had a PIL image
    v2.ToDtype(torch.uint8, scale=True),  # optional, most input are already uint8 at this point
    # ...
    v2.RandomResizedCrop(size=(224, 224), antialias=True),  # Or Resize(antialias=True)
    # ...
    v2.ToDtype(torch.float32, scale=True),  # Normalize expects float input
    v2.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
venkatram-dev commented 1 week ago

Below is my understanding, others can chime in as needed :)

Yeah, that is a good point. In My opinion, May be that doc needs to be clear to specify the difference in padding operation done on pillow image and on a tensor.

May be that doc needs to be clear to specify the difference in padding operation done on pillow image and on a tensor.

If we look at other docs for padding, they have used pillow images. https://pytorch.org/vision/main/auto_examples/transforms/plot_transforms_illustrations.html#sphx-glr-auto-examples-transforms-plot-transforms-illustrations-py

Anyways, this is my understanding.

Extra padding (padding size greater than image size) works on a pillow image.

But extra padding does not work on a tensor.

So, if we need extra padding ,it has to be on pillow image.

We can then do the other tensor operations after it.

Root Cause Analysis :

Padding on pillow images uses pillow functions and numpy functions and do not do any checking on dimensions.

https://github.com/pytorch/vision/blob/main/torchvision/transforms/_functional_pil.py#L144-L220

padding on tensor uses pytorch code and does strict type checking for dimensions.

https://github.com/pytorch/pytorch/blob/d14fe3ffeddff743af09ce7c8d91127940ddf7ed/aten/src/ATen/native/ReflectionPad.cpp#L241-L249

My understanding is that PyTorch does these internal checks to prevent padding operations from exceeding the dimensions of a tensor, ensuring that all computations stay within the allocated memory bounds to avoid errors like crashes or data corruption.

Scenario 1 . Extra padding (padding size greater than image size) works on a pillow image.

from torchvision.transforms import v2 as T2
import torchvision.transforms.functional as F
import torch

orig_img = torch.rand([3,32,32])
orig_img = F.to_pil_image(orig_img)
print ('orig type',type(orig_img))
print ('orig shape',orig_img.size)

# Using v2 API for padding
transform = T2.Compose([
        T2.Pad(padding=36, padding_mode='reflect'),  # Use v2.Pad directly
    T2.ToImage(), 
    #T2.ToTensor()
])

#transform
# Apply transformation
trans_img = transform(orig_img)
print('trans_img type',type(trans_img))

print('trans_img shape',trans_img.shape)
trans_img

Above code works

Scenario 2 : But extra padding does not work on a tensor.

from torchvision.transforms import v2 as T2
import torchvision.transforms.functional as F
import torch

orig_img = torch.rand([3,32,32])
orig_img = F.to_pil_image(orig_img)
print ('orig type',type(orig_img))
print ('orig shape',orig_img.size)

# Using v2 API for padding
transform = T2.Compose([
    T2.ToImage(), 
    T2.Pad(padding=36, padding_mode='reflect'),  # Use v2.Pad directly
])

#transform
# Apply transformation
trans_img = transform(orig_img)
trans_img.shape
print('trans_img type',type(trans_img))

print('trans_img shape',trans_img.shape)
trans_img

RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (36, 36) at dimension 3 of input [1, 3, 32, 32]

from torchvision.transforms import v2 as T2
import torch

# Create a random image tensor
orig_img = torch.rand([3, 32, 32])  # This is a tensor
print ('orig type',type(orig_img))
print ('orig shape',orig_img.shape)

# Define a transformation pipeline with v2 API
transform = T2.Compose([
    T2.Pad(padding=36, padding_mode='reflect'),  # Check if T2.Pad accepts tv_tensors.Image
])

# Apply the transformation
trans_img = transform(orig_img)
print('trans_img type',type(trans_img))

print('trans_img shape',trans_img.shape)
trans_img

RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (36, 36) at dimension 3 of input [1, 3, 32, 32]


from torchvision.transforms import v2 as T2
import torch

# Create a random image tensor
orig_img = torch.rand([3, 32, 32])  # This is a tensor
print ('orig type',type(orig_img))
print ('orig shape',orig_img.shape)

# Define a transformation pipeline with v2 API
transform = T2.Compose([
    T2.ToImage(),  # Convert tensor to tv_tensors.Image
    T2.Pad(padding=36, padding_mode='reflect'),  # Check if T2.Pad accepts tv_tensors.Image
])

# Apply the transformation
trans_img = transform(orig_img)
print('trans_img type',type(trans_img))

print('trans_img shape',trans_img.shape)
trans_img

RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (36, 36) at dimension 3 of input [1, 3, 32, 32]

Scenario 3: Padding with size less than input dimension works on tensor

from torchvision.transforms import v2 as T2
import torchvision.transforms.functional as F
import torch

orig_img = torch.rand([3,32,32])
orig_img = F.to_pil_image(orig_img)
print ('orig type',type(orig_img))
print ('orig shape',orig_img.size)

# Using v2 API for padding
transform = T2.Compose([
    T2.ToImage(), 
    T2.Pad(padding=30, padding_mode='reflect'),  # Use v2.Pad directly
])

#transform
# Apply transformation
trans_img = transform(orig_img)
trans_img.shape
print('trans_img type',type(trans_img))

print('trans_img shape',trans_img.shape)
trans_img

Above code works


from torchvision.transforms import v2 as T2
import torch

# Create a random image tensor
orig_img = torch.rand([3, 32, 32])  # This is a tensor
print ('orig type',type(orig_img))
print ('orig shape',orig_img.shape)

# Define a transformation pipeline with v2 API
transform = T2.Compose([
    T2.Pad(padding=31, padding_mode='reflect'),  # Check if T2.Pad accepts tv_tensors.Image
])

# Apply the transformation
trans_img = transform(orig_img)
print('trans_img type',type(trans_img))

print('trans_img shape',trans_img.shape)
trans_img

Above code works

lxr2 commented 1 week ago

Many thanks, very clear explanations and instructions!