Hi, I'm trying to reimplement the deform conv in Tensorflow. I found in the original paper, the number of filters for the offsets is 2 c, where c is the number of input channels while the output offset resolution is the same as the input. However, in this repo, the number of filters for offset is 2 kernel_size ** 2 while the output resolution is the same as the final result, without any relationship with input channels. I understand the thoughts behind this implementation and I understand how it actually works. However, I am wondering whether this method will affect the final performance, especially when the number of input channels is large? Thank you.
Hi, I'm trying to reimplement the deform conv in Tensorflow. I found in the original paper, the number of filters for the offsets is 2 c, where c is the number of input channels while the output offset resolution is the same as the input. However, in this repo, the number of filters for offset is 2 kernel_size ** 2 while the output resolution is the same as the final result, without any relationship with input channels. I understand the thoughts behind this implementation and I understand how it actually works. However, I am wondering whether this method will affect the final performance, especially when the number of input channels is large? Thank you.