isl-org / MiDaS

Code for robust monocular depth estimation described in "Ranftl et. al., Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer, TPAMI 2022"
MIT License
4.25k stars 597 forks source link

[question] Any suggestions on normalizing the outputs better? #253

Open wes-kay opened 7 months ago

wes-kay commented 7 months ago

Original: image

Output: image

This is currently an image that I took of a port hole in converted to ply, using the .pfm, and also normalized it before converting, viewed in blender.

The issue I'm having is realistically the edges should be flatter and not scale to infinity like it currently is, is there any real tricks to making the pfm a little better to work with?

Any suggestions are appreciated.

northagain commented 7 months ago

disp to depth with min and max depth using this code may be helpful def disp_to_depth(disp, min_depth, max_depth):     """Convert network's sigmoid output into depth prediction     The formula for this conversion is given in the 'additional considerations'     section of the paper.     """     min_disp = 1 / max_depth     max_disp = 1 / min_depth     scaled_disp = min_disp + (max_disp - min_disp) * disp     depth = 1 / scaled_disp     return scaled_disp, depth

isJHan commented 6 months ago

disp to depth with min and max depth using this code may be helpful def disp_to_depth(disp, min_depth, max_depth):     """Convert network's sigmoid output into depth prediction     The formula for this conversion is given in the 'additional considerations'     section of the paper.     """     min_disp = 1 / max_depth     max_disp = 1 / min_depth     scaled_disp = min_disp + (max_disp - min_disp) * disp     depth = 1 / scaled_disp     return scaled_disp, depth

There is no 'additional considerations' in the paper I downloaded at IEEE website. Can you provide the right version? Thank you!

Besides, I'm confused about 'sigmoid output' in your code. In the dpt_beit_large_512 network, I can't find the sigmoid layer. I find the output of my network is between 900 and 10000. In this case, how can I transform output to depth?

Thank you very much!

thucz commented 6 months ago

I have the same question. The inverse depth output is between 900 and 10000. Is the problem resolved?

isJHan commented 6 months ago

I have the same question. The inverse depth output is between 900 and 10000. Is the problem resolved?

Hi. I convert the output to depth map by this way. Firstly I inverse the output directly by ‘depth=1/output’, then use min-max normalize method ‘depth= (depth-depth.min())/(depth.max()-depth.min())’ for an valid depth map.

thucz commented 6 months ago

@isJHan Thanks!

isJHan commented 6 months ago

@isJHan Thanks!

Today I found a bug in this procedure. When I infer on another dataset, the output can be negative or 0. So there will be a bias added to output like depth=1/(output+bias). Do you have a better way? Thanks!

thucz commented 6 months ago

Now I also add a bias and a scale to the output like this (but not to [0, 1]):

https://github.com/KU-CVLAB/DaRF/blob/47b2d1a23d13f0d149e55cf8fd2195ec42093d1e/plenoxels/models/dpt_depth.py#L87C18-L87C18

isJHan commented 5 months ago

Now I also add a bias and a scale to the output like this (but not to [0, 1]):

https://github.com/KU-CVLAB/DaRF/blob/47b2d1a23d13f0d149e55cf8fd2195ec42093d1e/plenoxels/models/dpt_depth.py#L87C18-L87C18

Thanks. But how can we get alpha and beta for another dataset?

thucz commented 5 months ago

@isJHan See the supplementary of RichDreamer in Page 13-14 (Sec A.2). It shows the general normalization methods