isl-org / MiDaS

Code for robust monocular depth estimation described in "Ranftl et. al., Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer, TPAMI 2022"
MIT License
4.43k stars 619 forks source link

Depth in float32 in meters units #36

Closed n-kasatkin closed 4 years ago

n-kasatkin commented 4 years ago

Hello! Thanks for you work!

I have two questions:

  1. What is the .pmf format and what is it used for?
  2. While opening .png depth maps how to convert them into float32 in meters units?
ranftlr commented 4 years ago

PFM files contain the results as the original, unaltered float32 values. The PNGs contain the quantized values since the format requires integer data. If you need the original data as predicted by the model, use the PFM files. You can find a reader here: https://github.com/intel-isl/MiDaS/blob/master/utils.py

As for getting results in meters: The model provides results up to unknown scale and shift, i.e. only relative depth is available. You'd need additional measurements to find the scale and the shift to get absolute measurements.

ranftlr commented 4 years ago

Closing due to inactivity.

tarashakhurana commented 4 years ago

Hi, Thank you for your amazing work! In your paper, Table 1 mentions depth annotations of the 6 test datasets to be metric. How do you find this metric depth from your relative inverse-depth predictions to evaluate on these datasets? Do you invert the inverse-depth predictions and then compute a least squares fit for finding the scale and shift parameters for it?

ranftlr commented 4 years ago

We align in inverse depth space; i.e. the procedure is

1) Invert ground truth 2) Align prediction based on least squares criterion 3) Invert aligned prediction to get depth 4) Measure errors to original groundtruth depth

Section C in the supplementary provides more details on the evaluation procedure.

sidml commented 3 years ago

@ranftlr I went through equation 4 of the paper. This is how i implemented it. Please correct me if i am misunderstanding something. Let us say pred_idepth is model predicitons of size=(N, ). gt_idepth is ground truth inverse depth, size=(N, ).

    di = np.concatenate([pred_idepth[:, None], np.ones((len(pred_idepth), 1))], 1)
    di_star = gt_idepth
    val1, val2 = np.zeros((1, 2)), np.zeros((1, 2))    
    for i in range(len(di)):
        val1  = val1 + di[i]*di[i].T
        val2 = val2 + di[i]*di_star[i]
   st = (np.linalg.pinv(val1) * val2).reshape(-1, )
   # st = (1/val1 * val2).reshape(-1, ) or this ???

st should be the estimated scale and translation (hopt in paper).

tarashakhurana commented 3 years ago

Embarrassingly enough, I'm not able to find the supplementary material online. Does anyone have the link?

sidml commented 3 years ago

@tarashakhurana I think they have removed the supplementary material in v3 of the paper. It exists in v2. You can find it here

tarashakhurana commented 3 years ago

Thank you! By any chance, were you able to reproduce the test set results in the paper? I'm trying to evaluate on Sintel (on its 1064 images) but I'm getting a much higher error with my evaluation code than what is reported in the paper (0.327 vs 0.605).