Inference at higher resolution on NYU

massimilianoviola commented 4 months ago

Hello, thanks a lot for providing the code for your method! I was trying it out on NYU but doing inference at the original resolution of 640x480 always using 500 random valid points. As you see in the plot, I get some bizarre texture in the results. While I acknowledge that the point density is now more than four times lower and that you also ablate a significant performance loss in the paper when using fewer points, I cannot explain why these square textures should appear in the first place, especially in flat regions of the background. Do you have any intuition about which part of the network might be failing there?

comparison

Just to double-check, I modified the NYU dataloader so that it does not apply the resizing and cropping to RGB and GT, the camera matrix is not transformed, and no padding is used as the resolution is divisible by 32. Is this the correct procedure to test generalization to other resolutions? And for a different dataset, should I just swap the camera matrix? Best, Massimiliano

kakaxi314 commented 4 months ago

Sorry that I don't know where these squares come from. From my view, you should modify camera matrix that the valid depth corresponds to the same 3D space. Also, since our method adopt the direct distance as the offset term, evaluation on higher resolution may cause this offset term in a much different distribution than training (e.g. the largest number of offset term is 320 in training, but now it maybe 640).

massimilianoviola commented 4 months ago

When the sparsity level rises, the texture appears at training resolution too. This suggests density plays a role, although now the degradation happens with less magnitude and later. But I see your point as well, thanks. I was just expecting the prediction to become blurry rather than forming these squares.

sparsity

What I meant with the camera matrix is that I kept the original NYU one without dividing by two and shifting the principal point coordinates, effectively bringing it back to 640x480. It should be acting in the correct 3D space. The same I would do with other datasets: swap the camera matrix and adapt it to the resolution I want to process at.

kakaxi314 / BP-Net

Inference at higher resolution on NYU #12