mattpoggi / mono-uncertainty

CVPR 2020 - On the uncertainty of self-supervised monocular depth estimation
233 stars 24 forks source link

How to train the self-teaching framework, an extended question #9

Closed kishore-greddy closed 3 years ago

kishore-greddy commented 3 years ago

I was reading your paper "On the uncertainty of self-supervised monocular depth estimation". Firstly, I would like to congratulate you on writing such an useful and good paper. Not much research has been done on the Uncertainty topic, and you have done it. Thank you for that. I have a question regarding the loss function in the Self Teaching approach. Since the training code is not provided, I wanted to try it out myself as I wanted to reproduce your results for my project. If my understanding is correct, I first use the pretrained model to generate pseudo ground truths, and then use them to train the Student Network. The student network has two channels as outputs, one is disparity and the other is the uncertainty. So, in the loss function ,

image

1) Does image mean the disparity of the Student network and image  mean the uncertainty generated at the second output channel. 2) Does the L1 difference operator take pixel wise difference between Teacher and the Student and finally the sum of all the pixel-wise losses are taken to minimize?  Please let me know. I am stuck at this point and it would be really helpful if you could shed some light on it. 

Edit : I have already seen the issue #2 , there was no real mention of Loss there. This question might be a simple one, but I am no expert in this field, I have just started working on this. Excuse any ignorance. Thanks.

mattpoggi commented 3 years ago

Hi,

  1. \mu and \sigma are respectively the two output channels, that assumes the meaning of estimated mean and estimated variance for the depth distribution d_s predicted by the student
  2. yes, the L1 loss is computed pixel-wise. Then, you divide (still pixel-wise) by the variance and sum the log term to avoid infinite variance. Finally, you compute the average over all pixels Hope this helps
kishore-greddy commented 3 years ago

Hi @mattpoggi , Thanks for the explanation. It sure helps 👍