zzangjinsun / NLSPN_ECCV20

Park et al., Non-Local Spatial Propagation Network for Depth Completion, ECCV, 2020
MIT License
321 stars 57 forks source link

The confidence issue still exists. #46

Open ashutosh1807 opened 2 years ago

ashutosh1807 commented 2 years ago

If we are training the nlspn model from scratch, then as per instructions we do not require legacy, but conceptually it may lead to wrong calculations. We have three inputs - per pixel confidence, per pixel neighbour offset, per pixel offsets wrt to 8 neighbours of 3x3 kernel across the pixel. If we don't use legacy, then for the confidence calculation of neighbours we are using wrong offset because offsets are wrt to anchors defined by 3x3 kernel across the pixel. That is, offset shout be wrt to the pixel itself. If we use legacy, then due to adjustment to offset_tmp by reference, the offsets become wrt to centre pixel which indeed is a wrong assumption while doing the deformable convolution.

zzangjinsun commented 2 years ago

Hi, I will check the codes again thx!

ashutosh1807 commented 2 years ago

Hi @zzangjinsun, any updates on the same?

zzangjinsun commented 2 years ago

I will be busy until the end of the semester. Will check the code during the summer break! (Let's keep this issue open for a while...)

nchodosh commented 1 year ago

Hi,

First off, nice work, and thanks for publishing the code!

I just wanted to ping this discussion since your original implementation and what @ashutosh1807 said seems correct. The offsets are relative to the normal kernel offsets, so when you shift to 1x1 convolutions, you must adjust them to account for that.

I think that this is clear with this alternative implementation of lines 113-144 that avoids the loop and 1x1 convolutions:

w_full = (torch.eye(9)).reshape(9, 1, 3, 3) # each output channel corresponds to one iteration of the original loop
b_full = torch.zeros(9)
conf_aff = deform_conv2d(confidence, offset, w_full, b_full, padding=1)
conf_aff = torch.cat(conf_aff[:, : self.idx_ref], conf_aff[:, (self.idx_ref + 1) :], dim=1) # delete the 0 offset output

From my tests, this gives the same results as your original (legacy) implementation.