occasions where sampling locations are outside of the image boundary

DCN and DCNv2 is very powerful. EDVR, winner of NTIRE 2019 used DCNv2 for frame alignment. But it seems that it is unstable when training. Author of EDVR said unstable offset phenomenon is very frequent. According to DCNv2_op/README.md,

To better handle occasions where sampling locations are outside of the image boundary. In the previous operator, if the sampling location is outside of the feature map boundary, its sampled value would be zero. Thus, the gradient with respect to learnable offset would be zero. We found such a scheme may deteriate the performance in ImageNet classification (perhaps because the feature maps are of low resolution). For object detection on COCO, both the previous and the updated operators deliver the same results. In the new operator, if the sampling location is within one pixel outside of the feature map boundary, bilinear sampling would also be applied. And gradient with respect to learnable offset can be non zero for such locations. This is implemented by padding zeros (by one row/column) outside of the boundaries of feature maps, and performing bilinear sampling on the padded feature maps.

I'm wondering why zero padding / bilinear with zero is used instead of using to the pixel of boundary? Would the deformable layer learnt to throw a large offset that out-of-boundary to make the gradient zero?

msracver / Deformable-ConvNets

occasions where sampling locations are outside of the image boundary #278