pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
15.99k stars 6.92k forks source link

String parameter cannot be passed to `torchvision.ops.deform_conv2d` #6394

Open dario-loi opened 2 years ago

dario-loi commented 2 years ago

🐛 Describe the bug

Bug Explanation

The current implementation of torchvision.ops.deform_conv2d implicitly assumes that padding is passed as a tuple or an integer, this means that if the padding is passed as either "same" or "valid" then torchvision is going to parse it as padding = ("same").

Sample Code

import torch

X = torch.randn((1,3,5,5))
torchvision.ops.deform_conv2d(
            X,
            torch.randn([1, 2 * 1 * 3 * 3, 3, 3]), #offset
            torch.randn([1, 3, 3, 3]),
            torch.randn([1]),
            stride=1,
            padding="zero",
            dilation=1,
            mask=None,
        )

Stacktrace

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
d:\Documents\Python Scripts\test.ipynb Cell 13 in <cell line: 2>()
      [1](vscode-notebook-cell:/d%3A/Documents/Python%20Scripts/test.ipynb#X14sZmlsZQ%3D%3D?line=0) X = torch.randn((1,3,5,5))
----> [2](vscode-notebook-cell:/d%3A/Documents/Python%20Scripts/test.ipynb#X14sZmlsZQ%3D%3D?line=1) torchvision.ops.deform_conv2d(
      [3](vscode-notebook-cell:/d%3A/Documents/Python%20Scripts/test.ipynb#X14sZmlsZQ%3D%3D?line=2)             X,
      [4](vscode-notebook-cell:/d%3A/Documents/Python%20Scripts/test.ipynb#X14sZmlsZQ%3D%3D?line=3)             torch.randn([1, 2 * 1 * 3 * 3, 3, 3]), #offset
      [5](vscode-notebook-cell:/d%3A/Documents/Python%20Scripts/test.ipynb#X14sZmlsZQ%3D%3D?line=4)             torch.randn([1, 3, 3, 3]),
      [6](vscode-notebook-cell:/d%3A/Documents/Python%20Scripts/test.ipynb#X14sZmlsZQ%3D%3D?line=5)             torch.randn([1]),
      [7](vscode-notebook-cell:/d%3A/Documents/Python%20Scripts/test.ipynb#X14sZmlsZQ%3D%3D?line=6)             stride=1,
      [8](vscode-notebook-cell:/d%3A/Documents/Python%20Scripts/test.ipynb#X14sZmlsZQ%3D%3D?line=7)             padding="zero",
      [9](vscode-notebook-cell:/d%3A/Documents/Python%20Scripts/test.ipynb#X14sZmlsZQ%3D%3D?line=8)             dilation=1,
     [10](vscode-notebook-cell:/d%3A/Documents/Python%20Scripts/test.ipynb#X14sZmlsZQ%3D%3D?line=9)             mask=None,
     [11](vscode-notebook-cell:/d%3A/Documents/Python%20Scripts/test.ipynb#X14sZmlsZQ%3D%3D?line=10)         )

File d:\Installations\Anaconda\lib\site-packages\torchvision\ops\deform_conv.py:77, in deform_conv2d(input, offset, weight, bias, stride, padding, dilation, mask)
     74     bias = torch.zeros(out_channels, device=input.device, dtype=input.dtype)
     76 stride_h, stride_w = _pair(stride)
---> 77 pad_h, pad_w = _pair(padding)
     78 dil_h, dil_w = _pair(dilation)
     79 weights_h, weights_w = weight.shape[-2:]

ValueError: too many values to unpack (expected 2)

Versions

Collecting environment information...
PyTorch version: 1.12.1
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 10 Home
GCC version: (x86_64-posix-seh, Built by strawberryperl.com project) 8.3.0
Clang version: Could not collect
CMake version: version 3.24.0-rc1
Libc version: N/A

Python version: 3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.19041-SP0
Is CUDA available: True
CUDA runtime version: 11.5.50
GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1650
Nvidia driver version: 512.15
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.23.1
[pip3] numpydoc==1.4.0
[pip3] torch==1.12.1
[pip3] torch-tb-profiler==0.4.0
[pip3] torchaudio==0.12.1
[pip3] torchvision==0.13.1
[conda] blas                      1.0                         mkl
[conda] cudatoolkit               11.3.1               h59b6b97_2
[conda] mkl                       2021.4.0           haa95532_640
[conda] mkl-service               2.4.0            py38h2bbff1b_0
[conda] mkl_fft                   1.3.1            py38h277e83a_0
[conda] mkl_random                1.2.2            py38hf11a4ad_0
[conda] numpy                     1.23.1           py38h7a0a035_0
[conda] numpy-base                1.23.1           py38hca35cd5_0
[conda] numpydoc                  1.4.0            py38haa95532_0
[conda] pytorch                   1.12.1          py3.8_cuda11.3_cudnn8_0    pytorch
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] torch                     1.12.1                   pypi_0    pypi
[conda] torch-tb-profiler         0.4.0                    pypi_0    pypi
[conda] torchaudio                0.12.1               py38_cu113    pytorch
[conda] torchvision               0.13.1               py38_cu113    pytorch
datumbox commented 2 years ago

Hi @dario-loi, thanks for raising this.

I don't want to argue semantics but TorchVision quite explicitly declares that expected inputs for paddings is either an integer or a Tuple of ints: https://github.com/pytorch/vision/blob/93c85bbcc31f8d5a052daf06f2f91f39697af1a4/torchvision/ops/deform_conv.py#L14-L23

Currently it doesn't support strings ({'same', 'valid'}). Perhaps that's a worth expansion and possibly we would have to follow the same approach as _ConvNd from PyTorch Core. It's not within our plans to do this now but if you are interested in sending a PR, we can certainly review it.

dario-loi commented 2 years ago

Thank you for the response, I've began working on a possible PR, however, after dissecting the code it seems that there is no possibility, from the python library all the way down to the CUDA kernel, to have a padding with an uneven number of pixels, for example: left_pad = 1, right_pad = 2.

Would the implementation of this enhancement require the rewriting of both CPU & CUDA kernels to take into account this padding in the same way that _ConvND does?

Can we get away with a call to F.pad() as the implementation of _ConvND seems to be suggesting?

I'm asking you this since I've seen that you're the one responsible for the implementation of the kernels and therefore you would have knowledge regarding the performance impact of complicating the padding in C++ vs calling F.pad() on Python.

datumbox commented 2 years ago

@dario-loi This is a great question. Unfortunately I don't know the answer and we would probably need to dig deep on how the specific kernel is designed. I will need to find the bandwidth for such a thing as I'm a bit swamped at the moment. So if you continue work on this feature, keep in mind I might not be able to review or merge it.

I'm asking you this since I've seen that you're the one responsible for the implementation of the kernels and therefore you would have knowledge regarding

Unfortunately this is not true. We did a major refactoring of the C++ codebase at one point, to ensure that there is proper encapsulation and that we follow the latest practices from Dispatcher. Though I tried to rewrite as little parts as possible during the refactoring, unfortunately Github was unable to understand this on the diff and thus when you git blame you see me as the Uber author of all kernels. That's far from truth.