noahzn / Lite-Mono

[CVPR2023] Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation
MIT License
527 stars 58 forks source link

About indicator sq_rel #138

Closed LLLYLong closed 4 months ago

LLLYLong commented 4 months ago

Hello author, I am lucky to see such an excellent article. I would like to ask a question, what is the reason that when retraining, all other metrics are performing well and sq_rel has been difficult to lower. This problem has puzzled me for a long time and I have not been able to find a good method, I would like to ask the author if he has also encountered this kind of problem and how should I go about solving it. I am looking forward to the author's reply, thank you. my train image

paper image

I would also like to say that this problem is also not specific to that network, but is a problem I encountered while making further modifications, and would like to ask the authors for their experience if they have encountered a similar one during their experiments

noahzn commented 4 months ago

Hi, it's good to know that you have improved the results a lot since last time you asked me.

sq_rel is square relative error (|d_pred - d_gt| ^2 / d_gt). I think this error may be reduced if you have smoother and more consistent depth predictions.

noahzn commented 4 months ago

I am now closing this issue as there is no feedback.

LLLYLong commented 4 months ago

I am now closing this issue as there is no feedback.

Thanks for the author's reply, I apologize for not seeing the message in time, I'll try the smoothing factor right away to see if it makes a difference.

I would also like to ask a question, does the author know anything about the relationship between relative depth and metric depth, I'm a bit confused now, how is the relative depth converted to metric depth, do they just differ by an unknown focal length?

noahzn commented 4 months ago

@LLLYLong Just like other monocular self-supervised method, Lite-Mono can only predict relative depth. But in the evaluation we use median filter to scale the depth values.

LLLYLong commented 3 months ago

@LLLYLong Just like other monocular self-supervised method, Lite-Mono can only predict relative depth. But in the evaluation we use median filter to scale the depth values.

@noahzn Scaling depth using median filtering also makes the depth value meaningful. Am I to understand that there is a scaling factor difference between the relative depth and the metric depth. That is, the network predicts relative depth, and if I specify a range of depths and scale the depth map, I get metric depth.

noahzn commented 3 months ago

But in this way you cannot get very accurate depth. It also depends on the dataset you use.