Open Shubhamkumarroy opened 9 months ago
The best approach may be to try using the ZoeDepth models which are built to give metric distances as an output.
Otherwise, if you know the range of the depth of the image, you can convert the midas output into true depth using the formula:
True Depth = 1 / (A * normalized_midas_depth + B)
Where the variables A
and B
are given by:
A = (1 / min_depth) - (1/ max_depth)
B = 1 / max_depth
Here, the min_depth & max_depth refer to the minimum & maximum depth values in the image (i.e. you'd need to know something like, 'the closest point is 2 meters away, the farthest is 17 meters'. Then invert those numbers to calculate A and B). Though this approach will be sensitive to errors in the min/max depth values as well as the midas output (again, probably better to use the ZoeDepth models).
Thank you for the given formulas. It is working. Together with several points with known distance it gives proper results.
def depth_to_real(midas_prediction, known_points):
'''
Transfer relative MiDaS depths to real depths with known points
Args:
midas_prediction: output from MiDaS
known_points: points on image with known distances (x, y, distanse)
'''
# normalize midas prediction to 0...1
midas_depth_array = midas_prediction/np.max(midas_prediction)
if len(known_points)>=2:
# get pairs of normalized relative and real depths
points = np.array([(midas_depth_array[int(y), int(x)], distance) for x,y,distance in known_points])
# solve the system of equations :
# relative_depth*(1/min_depth) + (1-relative_depth)*(1/max_depth) = 1/real_depth
x = points[:,0] # normalized relative depth
y = 1/points[:,1] # reversed real depth
A = np.vstack([x, 1-x]).T
s, t = np.linalg.lstsq(A, y, rcond=None)[0]
min_depth = 1/s
max_depth = 1/t
else:
print('Not enough known points to make real depth estimation')
return None
# align relative depth to real depth
A = (1 / min_depth) - (1/ max_depth)
B = 1 / max_depth
midas_depth_aligned = 1 / (A * midas_depth_array + B)
return midas_depth_aligned
I am confused. Is there any way to extract the exact distance(in meters) of any pixel in the image? Assume I don't know any other points other than the predicted values. Can I still get the exact distance out of the image?
Is there any way to extract the exact distance(in meters) of any pixel in the image?
Metric depth models (like ZoeDepth) attempt to do this. With relative depth models (like MiDaS) you need additional information to convert the relative mapping to an absolute one.
give me the end-to-end complete code for calculating the depth using webcam and convert the distance like meter and centimeter
Is there any way to extract the exact distance(in meters) of any pixel in the image?
Metric depth models (like ZoeDepth) attempt to do this. With relative depth models (like MiDaS) you need additional information to convert the relative mapping to an absolute one.
If you know the real depth (meters) for 1 pixel, would it be enough to convert the rest of the depths to real distance too?
If you know the real depth (meters) for 1 pixel, would it be enough to convert the rest of the depths to real distance too?
Not quite, it's sort of a '2 knowns to figure out 2 unknowns' situation. You'd need to know the true depth for at least 2 pixels to be able to solve for A
and B
in the equation. In general though, you'd want to use many more than 2 points, since any error on those 2 pixels will lead to errors in estimating A
and B
. You might want to check out issue #171, where this was discussed in more detail (or check out the code from @ximader above).
That being said, if you want to try to fit using only two pixels, you can setup a system of 2 equations using the known pixels (and the equation from before) and solve it to figure out A
and B
. If your 2 known true depths are d1
and d2
and correspond to pixels with relative midas depths of m1
and m2
(respectively), then as far as I can tell, the parameters are given by:
Let:
inv_d1 = 1 / d1
inv_d2 = 1 / d2
then:
A = (inv_d2 - inv_d1) / (m2 - m1)
B = inv_d1 - m1 * A
And for clarity, I'm just getting this by re-arranging the equations:
d1 = 1 / (A * m1 + B)
d2 = 1 / (A * m2 + B)
I wonder if the part of the ego vehicle can be seen in the image, can I pick any two points of the ego vehicle as the distance reference points to calculate the scale and shift ?
For example, the distance of the red and blue points are known:
So, I can calibrate scale and shift in every frame
Ya that's a clever idea to stabilize the prediction. If that's still inconsistent, it should even be possible to grab the entire region of pixels belonging to the car and use a least-squares type of fit (like what @ximader posted) to further reduce the sensitivity to errors on individual pixels.
Is there any way to extract the exact distance(in meters) of any pixel in the image?
Metric depth models (like ZoeDepth) attempt to do this. With relative depth models (like MiDaS) you need additional information to convert the relative mapping to an absolute one.
If you know the real depth (meters) for 1 pixel, would it be enough to convert the rest of the depths to real distance too? But how to get the or now the real depth for 1 pixel?
If you know the real depth (meters) for 1 pixel... But how to get the or now the real depth for 1 pixel?
I think the idea is if you know the depth for some part of the image, then you can convert the relative depth into real distances. If you don't know the depth, then it's better to use the metric depth (ZoeDepth) models.
i have got midas depth but i need to convert this into distance like meter and centimeter. Can anyone help me?