Open phongnhhn92 opened 3 years ago
no, you should be pulling the min
and max
from the output and normalizing the data into a GrayScale space for instance based off of that.
@riteshpakala so what is the unit of the raw depth prediction? Is it inverted depth or just the depth?
@elenacliu it should be inverted, 1 = new and 0 = far
@riteshpakala I have printed out the value of the predicted depth, and found it is not in the range [0,1], but also satisfies the property that the smaller the farther, the bigger the nearer.
@elenacliu oh I see, I may be confusing it with another depth model. is it possible that it is still in the range of 0-255 and needs to normalize in post, value/255
Edit: I found some old code and yeah I was taking the min and max of the output to normalize to [0,1]
. this is pretty costly, for realtime though
The output is like this:
depth_map: [[ 2320.3528 2317.7908 2311.3635 ... 987.1105 834.85095 765.7877 ] [ 2317.6309 2315.899 2311.3477 ... 1015.4411 889.8536 833.15 ] [ 2310.6614 2310.7517 2310.259 ... 1078.0278 1009.96857 980.0071 ] ... [10098.441 10137.251 10221.653 ... 9858.545 9855.353 9854.753 ] [ 9902.26 9975.874 10136.235 ... 9838.973 9833.239 9830.733 ] [ 9814.838 9903.92 10097.963 ... 9830.934 9822.333 9818.274 ]]
@riteshpakala
The range just confuses me, and I have found a code clip which processes the normal map https://github.com/graemeniedermayer/stable-diffusion-webui-normalmap-script/blob/main/scripts/normalmap.py#L285
# output
normal = prediction
numbytes=2
normal_min = normal.min()
normal_max = normal.max()
max_val = (2**(8*numbytes))-1
# check output before normalizing and mapping to 16 bit
if normal_max - normal_min > np.finfo("float").eps:
out = max_val * (normal - normal_min) / (normal_max - normal_min)
else:
out = np.zeros(normal.shape)
# single channel, 16 bit image
img_output = out.astype("uint16")
# invert normal map
if not (invert_normal ^ model_type == 0):
img_output = cv2.bitwise_not(img_output)
img_output = (scale_depth * img_output).astype("uint16")
# three channel, 8 bits per channel image
img_output2 = np.zeros_like(processed.images[count])
img_output2[:,:,0] = img_output / 256.0
img_output2[:,:,1] = img_output / 256.0
img_output2[:,:,2] = img_output / 256.0
#pre blur (only blurs z-axis)
if pre_gaussian_blur:
img_output = cv2.GaussianBlur(img_output, (pre_gaussian_blur_kernel, pre_gaussian_blur_kernel), pre_gaussian_blur_kernel)
# take gradients
if sobel_gradient:
zx = cv2.Sobel(np.float64(img_output), cv2.CV_64F, 1, 0, ksize=sobel_kernel)
zy = cv2.Sobel(np.float64(img_output), cv2.CV_64F, 0, 1, ksize=sobel_kernel)
else:
zy, zx = np.gradient(img_output)
# combine and normalize gradients.
normal = np.dstack((zx, -zy, np.ones_like(img_output)))
n = np.linalg.norm(normal, axis=2)
normal[:, :, 0] /= n
normal[:, :, 1] /= n
normal[:, :, 2] /= n
# post blur (will break normal maps unitary values)
if post_gaussian_blur:
normal = cv2.GaussianBlur(normal, (post_gaussian_blur_kernel, post_gaussian_blur_kernel), post_gaussian_blur_kernel)
# offset and rescale values to be in 0-255
normal += 1
normal /= 2
normal *= 255
normal = normal.astype(np.uint8)
It seems that the depth doesn't have a constant range.
@elenacliu oh wait, I was referring to another thread actually. Ignore that (now deleted) comment.
What is the Colorspace of your input image? Is it BGR or RGB? I am just thinking if that was a possible edge case I experienced when I was seeing larger numbers
you mean the image that I gave MiDas to predict the depth image? I just run
python run.py --model_type dpt_beit_large_512
as the README.md instructs.
Hi, I am new to Midas. Can I ask what is the depth range of the predicted depth map. Is this [0,1] ?