isl-org / MiDaS

Code for robust monocular depth estimation described in "Ranftl et. al., Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer, TPAMI 2022"
MIT License
4.39k stars 617 forks source link

How to get a maximum depth by numeric, and x,y axis? #131

Closed ramuneblue closed 2 years ago

ramuneblue commented 2 years ago

Hi, Before launching your code, I would like to understand how to get a maximum depth by numeric, and the x/y axis. Could I have your advice on it? If you could tell me the specific key function where generates the maximum depth, it would be really helpful.

Allow me to add one more question, is your code available to use on the google colaboratory?

I'm looking forward to hearing from you soon!

ranftlr commented 2 years ago

Sorry, I don't quite understand your first question. You can take the max over all values, but that won't be informative as MiDaS only provides relative depth.

You can find a Colab here: https://colab.research.google.com/github/pytorch/pytorch.github.io/blob/master/assets/hub/intelisl_midas_v2.ipynb

ramuneblue commented 2 years ago

Thank you for your quick comment.
I see, then could you tell me how and from which function I can get the max & min depth values in one single image? I would like to check where is the deepest point in one image, and also compare which one is the deepest among several still images. If the "depth" is relative, still then it would be possible to check it using both max & min values.

I can reach a Colab, thank you very much! It's really great.

ranftlr commented 2 years ago

The result in the Colab is a numpy array, so you can simply use numpy.max and numpy.min

ramuneblue commented 2 years ago

Thank you for a simple answer, Actually I'm not a person of python, so. I'll try to get them from there.

JJrodny commented 2 years ago

Thank for this! How could we then convert the output to meters? There seem to be many questions about this in the 'issues' but I'm having trouble understanding how to do this. Could you show us here or in the colab with how to do that?

4 seems the closest but as far as I understand it that's with further estimating the camera intrinsics (even if we had the camera intrinsics).

Say I have figured out what my camera intrinsics are, like in #5,

{
    "height" : 480,
    "intrinsic_matrix" : 
    [
        610.0023193359375, #fx
        0.0,
        0.0,
        0.0,
        609.85760498046875, #fy
        0.0,
        425.36004638671875, #cx
        237.9273681640625, #cy
        1.0
    ],
    "width" : 848
}

5 suggests that even if we have the camera intrinsics like above we would still not get the absolute depth in meters, but a relative depth for each pixel.

42 says we still need the ground truth depth of at least two pixels to use MiDaS's output to infer the depth of all of the other pixels.

You'd need to know the absolute depth of at least two pixels in the image to derive the two unknowns. Based on these measurements you could align the predictions to these measurements as done in our SSIMSE loss.

But how can we use the SSIMSE loss in your paper to estimate the shift and scale in an image given two pixel GT depths?

ranftlr commented 2 years ago

See the code here https://gist.github.com/ranftlr/45f4c7ddeb1bbb88d606bc600cab6c8d

compute_scale_and_shift computes the scale and shift that aligns the the prediction to the target as in the SSIMSE

ramuneblue commented 2 years ago

Thank you for your quick comments. Let me close this question.