Open dario-loi opened 2 years ago
Hi @dario-loi, thanks for raising this.
I don't want to argue semantics but TorchVision quite explicitly declares that expected inputs for paddings is either an integer or a Tuple of ints: https://github.com/pytorch/vision/blob/93c85bbcc31f8d5a052daf06f2f91f39697af1a4/torchvision/ops/deform_conv.py#L14-L23
Currently it doesn't support strings ({'same', 'valid'}
). Perhaps that's a worth expansion and possibly we would have to follow the same approach as _ConvNd
from PyTorch Core. It's not within our plans to do this now but if you are interested in sending a PR, we can certainly review it.
Thank you for the response,
I've began working on a possible PR, however, after dissecting the code it seems that there is no possibility, from the python library all the way down to the CUDA kernel, to have a padding with an uneven number of pixels, for example: left_pad = 1, right_pad = 2
.
Would the implementation of this enhancement require the rewriting of both CPU & CUDA kernels to take into account this padding in the same way that _ConvND
does?
Can we get away with a call to F.pad()
as the implementation of _ConvND
seems to be suggesting?
I'm asking you this since I've seen that you're the one responsible for the implementation of the kernels and therefore you would have knowledge regarding the performance impact of complicating the padding in C++ vs calling F.pad()
on Python.
@dario-loi This is a great question. Unfortunately I don't know the answer and we would probably need to dig deep on how the specific kernel is designed. I will need to find the bandwidth for such a thing as I'm a bit swamped at the moment. So if you continue work on this feature, keep in mind I might not be able to review or merge it.
I'm asking you this since I've seen that you're the one responsible for the implementation of the kernels and therefore you would have knowledge regarding
Unfortunately this is not true. We did a major refactoring of the C++ codebase at one point, to ensure that there is proper encapsulation and that we follow the latest practices from Dispatcher. Though I tried to rewrite as little parts as possible during the refactoring, unfortunately Github was unable to understand this on the diff and thus when you git blame you see me as the Uber author of all kernels. That's far from truth.
🐛 Describe the bug
Bug Explanation
The current implementation of
torchvision.ops.deform_conv2d
implicitly assumes that padding is passed as a tuple or an integer, this means that if the padding is passed as either "same" or "valid" then torchvision is going to parse it aspadding = ("same")
.Sample Code
Stacktrace
Versions