Open lxr2 opened 1 week ago
Not sure of the reason to combine v1 and v2 together in v2.Compose([v2.ToImage(), T.Pad(padding=36,
T.Pad(
Below code works (tested in google colab) . Please try this.
from torchvision.transforms import v2 as T2
import torchvision.transforms.functional as F
import torch
orig_img = torch.rand([3,32,32])
orig_img = F.to_pil_image(orig_img)
# Using v2 API for padding
transform = T2.Compose([
T2.Pad(padding=36, padding_mode='reflect'), # Use v2.Pad directly
#T2.ToTensor()
])
transform
# Apply transformation
trans_img = transform(orig_img)
trans_img
It works, but following the docs, it seems that the standard steps should include v2.ToImage()
if the img is PIL format. I am confused about it.
This is what a typical transform pipeline could look like:
from torchvision.transforms import v2
transforms = v2.Compose([
v2.ToImage(), # Convert to tensor, only needed if you had a PIL image
v2.ToDtype(torch.uint8, scale=True), # optional, most input are already uint8 at this point
# ...
v2.RandomResizedCrop(size=(224, 224), antialias=True), # Or Resize(antialias=True)
# ...
v2.ToDtype(torch.float32, scale=True), # Normalize expects float input
v2.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
Below is my understanding, others can chime in as needed :)
Yeah, that is a good point. In My opinion, May be that doc needs to be clear to specify the difference in padding operation done on pillow image and on a tensor.
May be that doc needs to be clear to specify the difference in padding operation done on pillow image and on a tensor.
If we look at other docs for padding, they have used pillow images. https://pytorch.org/vision/main/auto_examples/transforms/plot_transforms_illustrations.html#sphx-glr-auto-examples-transforms-plot-transforms-illustrations-py
Anyways, this is my understanding.
Extra padding (padding size greater than image size) works on a pillow image.
But extra padding does not work on a tensor.
So, if we need extra padding ,it has to be on pillow image.
We can then do the other tensor operations after it.
Root Cause Analysis :
Padding on pillow images uses pillow functions and numpy functions and do not do any checking on dimensions.
https://github.com/pytorch/vision/blob/main/torchvision/transforms/_functional_pil.py#L144-L220
padding on tensor uses pytorch code and does strict type checking for dimensions.
My understanding is that PyTorch does these internal checks to prevent padding operations from exceeding the dimensions of a tensor, ensuring that all computations stay within the allocated memory bounds to avoid errors like crashes or data corruption.
Scenario 1 . Extra padding (padding size greater than image size) works on a pillow image.
from torchvision.transforms import v2 as T2
import torchvision.transforms.functional as F
import torch
orig_img = torch.rand([3,32,32])
orig_img = F.to_pil_image(orig_img)
print ('orig type',type(orig_img))
print ('orig shape',orig_img.size)
# Using v2 API for padding
transform = T2.Compose([
T2.Pad(padding=36, padding_mode='reflect'), # Use v2.Pad directly
T2.ToImage(),
#T2.ToTensor()
])
#transform
# Apply transformation
trans_img = transform(orig_img)
print('trans_img type',type(trans_img))
print('trans_img shape',trans_img.shape)
trans_img
Above code works
Scenario 2 : But extra padding does not work on a tensor.
from torchvision.transforms import v2 as T2
import torchvision.transforms.functional as F
import torch
orig_img = torch.rand([3,32,32])
orig_img = F.to_pil_image(orig_img)
print ('orig type',type(orig_img))
print ('orig shape',orig_img.size)
# Using v2 API for padding
transform = T2.Compose([
T2.ToImage(),
T2.Pad(padding=36, padding_mode='reflect'), # Use v2.Pad directly
])
#transform
# Apply transformation
trans_img = transform(orig_img)
trans_img.shape
print('trans_img type',type(trans_img))
print('trans_img shape',trans_img.shape)
trans_img
RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (36, 36) at dimension 3 of input [1, 3, 32, 32]
from torchvision.transforms import v2 as T2
import torch
# Create a random image tensor
orig_img = torch.rand([3, 32, 32]) # This is a tensor
print ('orig type',type(orig_img))
print ('orig shape',orig_img.shape)
# Define a transformation pipeline with v2 API
transform = T2.Compose([
T2.Pad(padding=36, padding_mode='reflect'), # Check if T2.Pad accepts tv_tensors.Image
])
# Apply the transformation
trans_img = transform(orig_img)
print('trans_img type',type(trans_img))
print('trans_img shape',trans_img.shape)
trans_img
RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (36, 36) at dimension 3 of input [1, 3, 32, 32]
from torchvision.transforms import v2 as T2
import torch
# Create a random image tensor
orig_img = torch.rand([3, 32, 32]) # This is a tensor
print ('orig type',type(orig_img))
print ('orig shape',orig_img.shape)
# Define a transformation pipeline with v2 API
transform = T2.Compose([
T2.ToImage(), # Convert tensor to tv_tensors.Image
T2.Pad(padding=36, padding_mode='reflect'), # Check if T2.Pad accepts tv_tensors.Image
])
# Apply the transformation
trans_img = transform(orig_img)
print('trans_img type',type(trans_img))
print('trans_img shape',trans_img.shape)
trans_img
RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (36, 36) at dimension 3 of input [1, 3, 32, 32]
Scenario 3: Padding with size less than input dimension works on tensor
from torchvision.transforms import v2 as T2
import torchvision.transforms.functional as F
import torch
orig_img = torch.rand([3,32,32])
orig_img = F.to_pil_image(orig_img)
print ('orig type',type(orig_img))
print ('orig shape',orig_img.size)
# Using v2 API for padding
transform = T2.Compose([
T2.ToImage(),
T2.Pad(padding=30, padding_mode='reflect'), # Use v2.Pad directly
])
#transform
# Apply transformation
trans_img = transform(orig_img)
trans_img.shape
print('trans_img type',type(trans_img))
print('trans_img shape',trans_img.shape)
trans_img
Above code works
from torchvision.transforms import v2 as T2
import torch
# Create a random image tensor
orig_img = torch.rand([3, 32, 32]) # This is a tensor
print ('orig type',type(orig_img))
print ('orig shape',orig_img.shape)
# Define a transformation pipeline with v2 API
transform = T2.Compose([
T2.Pad(padding=31, padding_mode='reflect'), # Check if T2.Pad accepts tv_tensors.Image
])
# Apply the transformation
trans_img = transform(orig_img)
print('trans_img type',type(trans_img))
print('trans_img shape',trans_img.shape)
trans_img
Above code works
Many thanks, very clear explanations and instructions!
🐛 Describe the bug
It seems that
v2.Pad
does not support cases where the padding size is greater than the image size, butv1.Pad
does support this. I hope that v2.Pad will allow this in the future as well.Versions