isl-org / MiDaS

Code for robust monocular depth estimation described in "Ranftl et. al., Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer, TPAMI 2022"
MIT License
4.25k stars 597 forks source link

Exact distance of image #268

Open Shubhamkumarroy opened 4 months ago

Shubhamkumarroy commented 4 months ago

i have got midas depth but i need to convert this into distance like meter and centimeter. Can anyone help me?

heyoeyo commented 4 months ago

The best approach may be to try using the ZoeDepth models which are built to give metric distances as an output.

Otherwise, if you know the range of the depth of the image, you can convert the midas output into true depth using the formula: True Depth = 1 / (A * normalized_midas_depth + B)

Where the variables A and B are given by:

A = (1 / min_depth) - (1/ max_depth)
B = 1 / max_depth

Here, the min_depth & max_depth refer to the minimum & maximum depth values in the image (i.e. you'd need to know something like, 'the closest point is 2 meters away, the farthest is 17 meters'. Then invert those numbers to calculate A and B). Though this approach will be sensitive to errors in the min/max depth values as well as the midas output (again, probably better to use the ZoeDepth models).

ximader commented 4 months ago

Thank you for the given formulas. It is working. Together with several points with known distance it gives proper results.

def depth_to_real(midas_prediction, known_points):

    '''
        Transfer relative MiDaS depths to real depths with known points
        Args:
        midas_prediction: output from MiDaS
        known_points: points on image with known distances (x, y, distanse)
    '''

    # normalize midas prediction to 0...1
    midas_depth_array = midas_prediction/np.max(midas_prediction)

    if len(known_points)>=2:
        # get pairs of normalized relative and real depths
        points = np.array([(midas_depth_array[int(y), int(x)], distance) for x,y,distance in known_points])

        # solve the system of equations : 
        # relative_depth*(1/min_depth) + (1-relative_depth)*(1/max_depth) = 1/real_depth
        x = points[:,0]  # normalized relative depth
        y = 1/points[:,1]  # reversed real depth
        A = np.vstack([x, 1-x]).T

        s, t = np.linalg.lstsq(A, y, rcond=None)[0]

        min_depth = 1/s
        max_depth = 1/t

    else:
        print('Not enough known points to make real depth estimation')
        return None

    # align relative depth to real depth
    A = (1 / min_depth) - (1/ max_depth)
    B = 1 / max_depth
    midas_depth_aligned = 1 / (A * midas_depth_array + B)

    return midas_depth_aligned
Rafid00 commented 4 months ago

I am confused. Is there any way to extract the exact distance(in meters) of any pixel in the image? Assume I don't know any other points other than the predicted values. Can I still get the exact distance out of the image?

heyoeyo commented 4 months ago

Is there any way to extract the exact distance(in meters) of any pixel in the image?

Metric depth models (like ZoeDepth) attempt to do this. With relative depth models (like MiDaS) you need additional information to convert the relative mapping to an absolute one.

jvishwa06 commented 4 months ago

give me the end-to-end complete code for calculating the depth using webcam and convert the distance like meter and centimeter

RoyAmoyal commented 3 months ago

Is there any way to extract the exact distance(in meters) of any pixel in the image?

Metric depth models (like ZoeDepth) attempt to do this. With relative depth models (like MiDaS) you need additional information to convert the relative mapping to an absolute one.

If you know the real depth (meters) for 1 pixel, would it be enough to convert the rest of the depths to real distance too?

heyoeyo commented 3 months ago

If you know the real depth (meters) for 1 pixel, would it be enough to convert the rest of the depths to real distance too?

Not quite, it's sort of a '2 knowns to figure out 2 unknowns' situation. You'd need to know the true depth for at least 2 pixels to be able to solve for A and B in the equation. In general though, you'd want to use many more than 2 points, since any error on those 2 pixels will lead to errors in estimating A and B. You might want to check out issue #171, where this was discussed in more detail (or check out the code from @ximader above).

That being said, if you want to try to fit using only two pixels, you can setup a system of 2 equations using the known pixels (and the equation from before) and solve it to figure out A and B. If your 2 known true depths are d1 and d2 and correspond to pixels with relative midas depths of m1 and m2 (respectively), then as far as I can tell, the parameters are given by:

Let:
  inv_d1 = 1 / d1
  inv_d2 = 1 / d2

then:

A = (inv_d2 - inv_d1) / (m2 - m1)
B = inv_d1 - m1 * A

And for clarity, I'm just getting this by re-arranging the equations:

d1 = 1 / (A * m1 + B)
d2 = 1 / (A * m2 + B)
joyyang1215 commented 1 month ago

I wonder if the part of the ego vehicle can be seen in the image, can I pick any two points of the ego vehicle as the distance reference points ​​to calculate the scale and shift ?

For example, the distance of the red and blue points are known: midas_abs_dist

So, I can calibrate scale and shift in every frame

heyoeyo commented 1 month ago

Ya that's a clever idea to stabilize the prediction. If that's still inconsistent, it should even be possible to grab the entire region of pixels belonging to the car and use a least-squares type of fit (like what @ximader posted) to further reduce the sensitivity to errors on individual pixels.