isl-org / ZoeDepth

Metric depth estimation from a single image
MIT License
2.22k stars 207 forks source link

Getting Metric Depth #10

Open talasalim opened 1 year ago

talasalim commented 1 year ago

How can I use my data to get the metric depth at a pixel level using the ZoeD model?

shariqfarooq123 commented 1 year ago

Could you describe more what problems are you facing? The output of the model is the metric depth. If you think units are wildly inaccurate, try with config_mode=eval while loading the model. You can choose to use ZoeD_N for indoor scenes, ZoeD_K for outdoor road scenes, and ZoeD_NK for generic scenes

Teifoc commented 1 year ago

@shariqfarooq123 I think @talasalim means how to get to a metric distance (like e.g. meters) from two known x,y coordinates (or distance from Camera to object surface) of the original picture, when providing two x,y coordinates which are known to be a fixed length in the picture for calibartion.

talasalim commented 1 year ago

@shariqfarooq123 @Teifoc Yes that is what I meant. Is there a way to get the absolute metric depth at a certain x,y coordinate?

VibAltekar commented 1 year ago

Following up here. I think you might need to provide the camera intrinsics that are unique per camera but I'm assuming these are known for the dataset in question. @talasalim @shariqfarooq123 @Teifoc any ideas?

Yarroudh commented 1 year ago

Under the file geometry.py I found two functions get_intrinsics and depth_to_points. I think if we change depth_to_points to this, we can actually define the camera intrinsics and extrinsics as we want:

def depth_to_points(depth, K=None, R=None, t=None):

    if K is None:
        K = get_intrinsics(depth.shape[1], depth.shape[2])
    Kinv = np.linalg.inv(K)
    if R is None:
        R = np.eye(3)
    if t is None:
        t = np.zeros(3)
Sivloc commented 1 year ago

Folowing on this, does somebody know which unit is used for the metric depth ? Comparing my results to ground truth data, ranging from 5 to 45 meters, i have values from 1200 to 8400 in my ZoeDepth output. Is this supposed to be millimeters ? Steps of 5 mm ?

ariqhadi commented 1 year ago

Hallo, sorry I'm quite a newbie here. so, are the numbers you were mentioning are the result from zoe.infer_pil(image)? and we can directly use that to know the estimation of the metric depth value? or is there any other steps to get that?

kwea123 commented 1 year ago

although the model is trained to predict metric depth, due to the limited data size, I think the prediction is still not metric accurate, but should be scale aware (i.e. if an object is twice as far as another, even if the absolute depth is incorrect, the proportional should be the same). In short I think the number is still "up to some scale"

Sivloc commented 1 year ago

Honestly, i have pretty good results taking directly the ouptut from zoe.infer_pil(image) as millimeters, but some of these algorithms do provide an output equivalent to MetricDepth = Scale*OutputDepth + Shift, where scale and shift are dependant of your camera parameters. If you're not sure about that, you can use linear regression to estimate those parameters, given that you have ground truth.

kwea123 commented 1 year ago

The model is trained to predict meters though

Sivloc commented 1 year ago

Could you describe more what problems are you facing? The output of the model is the metric depth. If you think units are wildly inaccurate, try with config_mode=eval while loading the model. You can choose to use ZoeD_N for indoor scenes, ZoeD_K for outdoor road scenes, and ZoeD_NK for generic scenes

Well it says that the output is metric, not meters right ? At least in my case, if the output is actually meters, it would be insanely inaccurate.

kwea123 commented 1 year ago

the depth in training and eval is converted to meters: https://github.com/isl-org/ZoeDepth/blob/edb6daf45458569e24f50250ef1ed08c015f17a7/zoedepth/data/data_mono.py#L353-L354 https://github.com/isl-org/ZoeDepth/blob/edb6daf45458569e24f50250ef1ed08c015f17a7/zoedepth/data/ddad.py#L98 https://github.com/isl-org/ZoeDepth/blob/edb6daf45458569e24f50250ef1ed08c015f17a7/zoedepth/data/diml_indoor_test.py#L97-L98

shariqfarooq123 commented 1 year ago

As @kwea123 pointed out, the model was trained with meters as units for depth. So the output is always supposed to be in meters. However, the input padding in the infer and infer_pil API may easily change the overall scale of the output but should be more or less consistent.

Try turning the padding off with pad_input=False (at the cost of border artifacts, see zoedepth.models.depth_model:L57)

TLDR:

import torch

zoe = torch.hub.load("isl-org/ZoeDepth", "ZoeD_N", pretrained=True)
predicted_depth = zoe.infer_pil(image, pad_input=False)  # Better 'metric' accuracy

Let me know if this helps

Sivloc commented 1 year ago

Okay thanks a lot! I was actually using the save_raw_16bit function from misc.py, which multiply all values by 256.

def save_raw_16bit(depth, fpath="raw.png"):
    if isinstance(depth, torch.Tensor):
        depth = depth.squeeze().cpu().numpy()

    assert isinstance(depth, np.ndarray), "Depth must be a torch tensor or numpy array"
    assert depth.ndim == 2, "Depth must be 2D"
    depth = depth * 256  # scale for 16-bit png
    depth = depth.astype(np.uint16)
    depth = Image.fromarray(depth)
    depth.save(fpath)
    print("Saved raw depth to", fpath)

No wonder i had bad metrics while comparing to ground truth... Thanks for pointing that out!

hpstyl commented 1 year ago

Interesting! So now are you able to reproduce the ground truth metric depth?

Sivloc commented 1 year ago

Well it sure is better than before, but it stills struggle with the background of my ground truth. Here is what it looks like : comparison The background is ~30 meters farer than predicted. Also, i should mention that i used the zoedepth_nk model.

jorismak commented 1 year ago

Folowing on this, does somebody know which unit is used for the metric depth ? Comparing my results to ground truth data, ranging from 5 to 45 meters, i have values from 1200 to 8400 in my ZoeDepth output. Is this supposed to be millimeters ? Steps of 5 mm ?

If you look at the code of the utility function save_as_raw_16bit (or something like that ), you'll see they get the data , multiply it by 256 and round it off ro unsigned 16bit integere (so 0 - 65535) .

That means you can a) Use the raw data yourself , since it are floating point numbers that represent meters as far as I know (model can be off ofcourse).

Or b) read the raw 16bit integere in that you might already have , divide the values by 256 to get close to the original float output of the model.

The values you mention divided by 256 come closer to what you describe as the values you are looking for .

(Edit: upon reloading I now see there were already replies and this has been said before. Sorry . When I opened the issue that part of the discussion wasn't visible to me )

GinRawin commented 9 months ago

Well it sure is better than before, but it stills struggle with the background of my ground truth. Here is what it looks like : comparison The background is ~30 meters farer than predicted. Also, i should mention that i used the zoedepth_nk model.

When I use the function save_raw_16bit, I only got a totally black picture. How do you get the real distance ? Which function do you use? Thank you for your answer!

jorismak commented 9 months ago

If using save_raw_16bit: You get back a greyscale image, in other words you get back width x height, and for every point a number between >= 0 and <= 65535. That is the 16bit integer range.

Divide that number by 256 to get what the model predicts as meters. Of course it depends on camera, model accuracy and upscaling and all that. But the numbers save_raw_16bit returns are meters multiplied by 256. So divide by 256 to get back some sort of meters.

GinRawin commented 9 months ago

If using save_raw_16bit: You get back a greyscale image, in other words you get back width x height, and for every point a number between >= 0 and <= 65535. That is the 16bit integer range.

Divide that number by 256 to get what the model predicts as meters. Of course it depends on camera, model accuracy and upscaling and all that. But the numbers save_raw_16bit returns are meters multiplied by 256. So divide by 256 to get back some sort of meters.

Thank you for your help! My code was like this:

image = Image.open("image.png").convert("RGB")
model_zoe_n = torch.hub.load(".", "ZoeD_NK", pretrained=True, source="local")
DEVICE = "cuda:1" if torch.cuda.is_available() else "cpu"
zoe = model_zoe_n.to(DEVICE)
depth = zoe.infer_pil(image)

I find that the numbers save_raw_16bit returns are depth multiplied by 256.So I think the depth there should be the real distance of the photo? If I am right, the result is bad. Maybe the reason is that the camera is too close to the object in my photo. It is only about 20 cm far from my camera.

807xuan commented 7 months ago

Well it sure is better than before, but it stills struggle with the background of my ground truth.嗯,它确实比以前更好,但它仍然与我的基本事实背景相斗争。 Here is what it looks like : 它是这样的: comparison The background is ~30 meters farer than predicted. 背景比预计远约 30 米。 Also, i should mention that i used the zoedepth_nk model. 另外,我应该提到我使用了 zoedepth_nk 模型。

May I ask how you generated your result graph?

Flaviaaa123 commented 3 months ago

Well it sure is better than before, but it stills struggle with the background of my ground truth. Here is what it looks like : comparison The background is ~30 meters farer than predicted. Also, i should mention that i used the zoedepth_nk model.

Hello, can you please tell me how you generate ground truth for an image? I want to compare too my predicted depth with ground truth. Thanks!

Sivloc commented 3 months ago

You can't generate the ground thruth, you have to actually measure it. You have two options (that i know of) :

Sivloc commented 3 months ago

Well it sure is better than before, but it stills struggle with the background of my ground truth.嗯,它确实比以前更好,但它仍然与我的基本事实背景相斗争。 Here is what it looks like : 它是这样的: comparison The background is ~30 meters farer than predicted. 背景比预计远约 30 米。 Also, i should mention that i used the zoedepth_nk model. 另外,我应该提到我使用了 zoedepth_nk 模型。

May I ask how you generated your result graph?

Sorry, i just saw your question. Which result graph are you talking about ? For the 3 of them, i plotted the output matrix. I don't think i still have the code i used.