Loss and training of the Rigid Structure Reconstructor

yijie0710 commented 4 years ago

Hello, dear authors. I have read the paper and am trying to reproduce a PyTorch version of it. I was wondering that is possible for the rigid structure reconstruction (RSR) to be lazy to learn the depth? In the Eq(1) of the paper, is that possible the net just output T*D as E? And the rigid flow would be zero; the depth of all pixels are the same. The net reconstructs the target image equivalent to the source image and eventually cannot learn the real depth and pose?

BTW, in my experiments, the smooth loss converges quickly but the rigid warp loss didn't change a lot. The depths firstly are quite noisy. Then after thousands of iterations, the depths of all pixels become all the same.

yijie0710 commented 4 years ago

I mean maybe there should be a regularization on Depth to prevent "the all same depth output"?

yzcjtr commented 4 years ago

Since we enforce the warping loss for both stages, including the rigid reconstruction part, identical depth prediction cannot lead to convergence in terms of optimization;
I think it depends on your network initialization. You can check whether most of the rigid flow points to coordinates out of the image plane. In that case, the warping loss doesn't give you reasonable supervision signal. Another way to debug it is to try different scaling coefficients of the posenet output or the depthnet output (https://github.com/yzcjtr/GeoNet/blob/5b176025bc63563ef53297aa3d20cc6e575fb833/geonet_nets.py#L10-L11). You can also visualize the warped images guided by rigid flow.

yijie0710 commented 4 years ago

Thank you so much for your reply. I will try that.

yzcjtr / GeoNet

Loss and training of the Rigid Structure Reconstructor #67