mattpoggi / mono-uncertainty

CVPR 2020 - On the uncertainty of self-supervised monocular depth estimation
235 stars 24 forks source link

Content of uncertainty map by log method #4

Closed shawLyu closed 4 years ago

shawLyu commented 4 years ago

Hi, thanks for your great work. I noticed that there were two work for MDE in CVPR20 using uncertainty loss, another work was D3VO. Both of you used the same uncertainty loss (log section in your paper), but gotten totally different uncertainty map. I can get uncertainty map as yours. So I‘d like to ask if you know the reason. Looking forward to your reply. Thanks.

mattpoggi commented 4 years ago

Hi @shawLyu, thanks for pointing it out, that's an interesting question. The main difference I've found between the two is that D3VO also estimates the brightness transformation parameters between the different frames. This may have an impact

shawLyu commented 4 years ago

Hi @mattpoggi Thanks for your reply, I will do this experiment next.

mattpoggi commented 4 years ago

I forgot to mention that, according to D3VO paper, "DepthNet also predicts the depth map D{t^s} of the right image I{t^s}". This can also make a difference.

chinhsuanwu commented 3 years ago

Hi @mattpoggi

Thanks for your innovative work. I had the same confusion before, but after conducting many experiments, I found there might be a potential issue in the implementation (not sure about it as both mono-uncertainty and D3VO did not release their code).

In my opinion, in the part of calculating the loss of Log, the shape of the to_optimse should be the same as the uncertainty.

(Pdb) to_optimise.shape
torch.Size([8, 192, 640])
(Pdb) uncer.shape
torch.Size([8, 1, 192, 640])
(Pdb) (to_optimise / uncer + torch.log(uncer)).shape
torch.Size([8, 8, 192, 640])

However, even if the shape is not a perfect match, the operation is still legal, as shown above, and could lead to the results like yours. On the other side, D3VO is doing it in the same shape and the results look totally different. Note that the following networks are using pure monodepth2 with a different shape of uncertainty, no extra skills (brightness transformation, right disparity prediction, or augmentation) are used.

img

Please let me know if I have any misunderstanding about your paper, thank you.