isl-org / MiDaS

Code for robust monocular depth estimation described in "Ranftl et. al., Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer, TPAMI 2022"
MIT License
4.37k stars 616 forks source link

Midas depth range ? #85

Open phongnhhn92 opened 3 years ago

phongnhhn92 commented 3 years ago

Hi, I am new to Midas. Can I ask what is the depth range of the predicted depth map. Is this [0,1] ?

riteshpakala commented 3 years ago

no, you should be pulling the min and max from the output and normalizing the data into a GrayScale space for instance based off of that.

elenacliu commented 1 year ago

@riteshpakala so what is the unit of the raw depth prediction? Is it inverted depth or just the depth?

riteshpakala commented 1 year ago

@elenacliu it should be inverted, 1 = new and 0 = far

elenacliu commented 1 year ago

@riteshpakala I have printed out the value of the predicted depth, and found it is not in the range [0,1], but also satisfies the property that the smaller the farther, the bigger the nearer.

riteshpakala commented 1 year ago

@elenacliu oh I see, I may be confusing it with another depth model. is it possible that it is still in the range of 0-255 and needs to normalize in post, value/255

Edit: I found some old code and yeah I was taking the min and max of the output to normalize to [0,1]. this is pretty costly, for realtime though

elenacliu commented 1 year ago

The output is like this:

depth_map: [[ 2320.3528 2317.7908 2311.3635 ... 987.1105 834.85095 765.7877 ] [ 2317.6309 2315.899 2311.3477 ... 1015.4411 889.8536 833.15 ] [ 2310.6614 2310.7517 2310.259 ... 1078.0278 1009.96857 980.0071 ] ... [10098.441 10137.251 10221.653 ... 9858.545 9855.353 9854.753 ] [ 9902.26 9975.874 10136.235 ... 9838.973 9833.239 9830.733 ] [ 9814.838 9903.92 10097.963 ... 9830.934 9822.333 9818.274 ]]

@riteshpakala

elenacliu commented 1 year ago

The range just confuses me, and I have found a code clip which processes the normal map https://github.com/graemeniedermayer/stable-diffusion-webui-normalmap-script/blob/main/scripts/normalmap.py#L285

# output
normal = prediction
numbytes=2
normal_min = normal.min()
normal_max = normal.max()
max_val = (2**(8*numbytes))-1

# check output before normalizing and mapping to 16 bit
if normal_max - normal_min > np.finfo("float").eps:
    out = max_val * (normal - normal_min) / (normal_max - normal_min)
else:
    out = np.zeros(normal.shape)

# single channel, 16 bit image
img_output = out.astype("uint16")

# invert normal map
if not (invert_normal ^ model_type == 0):
    img_output = cv2.bitwise_not(img_output)

img_output = (scale_depth * img_output).astype("uint16")

# three channel, 8 bits per channel image
img_output2 = np.zeros_like(processed.images[count])
img_output2[:,:,0] = img_output / 256.0
img_output2[:,:,1] = img_output / 256.0
img_output2[:,:,2] = img_output / 256.0

#pre blur (only blurs z-axis)
if pre_gaussian_blur:
    img_output = cv2.GaussianBlur(img_output, (pre_gaussian_blur_kernel, pre_gaussian_blur_kernel), pre_gaussian_blur_kernel)

# take gradients 
if sobel_gradient:
    zx = cv2.Sobel(np.float64(img_output), cv2.CV_64F, 1, 0, ksize=sobel_kernel)     
    zy = cv2.Sobel(np.float64(img_output), cv2.CV_64F, 0, 1, ksize=sobel_kernel) 
else:
    zy, zx = np.gradient(img_output)

# combine and normalize gradients.
normal = np.dstack((zx, -zy, np.ones_like(img_output)))
n = np.linalg.norm(normal, axis=2)
normal[:, :, 0] /= n
normal[:, :, 1] /= n
normal[:, :, 2] /= n

# post blur (will break normal maps unitary values)
if post_gaussian_blur:
    normal = cv2.GaussianBlur(normal, (post_gaussian_blur_kernel, post_gaussian_blur_kernel), post_gaussian_blur_kernel)

# offset and rescale values to be in 0-255
normal += 1
normal /= 2
normal *= 255   
normal = normal.astype(np.uint8)

It seems that the depth doesn't have a constant range.

riteshpakala commented 1 year ago

@elenacliu oh wait, I was referring to another thread actually. Ignore that (now deleted) comment.

What is the Colorspace of your input image? Is it BGR or RGB? I am just thinking if that was a possible edge case I experienced when I was seeing larger numbers

elenacliu commented 1 year ago

you mean the image that I gave MiDas to predict the depth image? I just run

python run.py --model_type dpt_beit_large_512 

as the README.md instructs.