lpiccinelli-eth / UniDepth

Universal Monocular Metric Depth Estimation
Other
462 stars 39 forks source link

encountered an AssertionError while using UniDepthV2 to predict depth #45

Open red-liu opened 1 month ago

red-liu commented 1 month ago

I really appreciate your great masterpiece. but I used UniDepthV2 to predict, encountered an AssertionError exception as below:

 File "/home/user/app/app.py", line 23, in <module>
 predictions = model.infer(rgb)
File "/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/user/app/unidepth/models/unidepthv2/unidepthv2.py", line 229, in infer
features, tokens = self.pixel_encoder(rgbs)
File "/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/app/unidepth/models/backbones/dinov2.py", line 324, in forward
x = self.prepare_tokens_with_masks(x, masks)
File "/home/user/app/unidepth/models/backbones/dinov2.py", line 312, in prepare_tokens_with_masks
x = x + self.interpolate_pos_encoding(x, w, h)
File "/home/user/app/unidepth/models/backbones/dinov2.py", line 297, in interpolate_pos_encoding
and int(h0) == patch_pos_embed.shape[-1]

my result is: int(w0)= 57,patch_pos_embed.shape[-2]= 57 and int(h0)= 43,patch_pos_embed.shape[-1]= 42

red-liu commented 1 month ago

another a question:if I tranform a picture to less resolution or more resolution,how will the result change?

lpiccinelli-eth commented 1 month ago

Thanks for using our work!

Which is your input shape? (or the config you are passing to the mode, like pixels_bounds, etc..)

To answer your question: the results may change a bit, but we expects them to be quite consistent, something that is not typical for previous works, especially in case of metric estimation.

red-liu commented 1 month ago

Thank you very much for your reply. My input is a picture, its shape is (4032, 3024) and it is from iphone 13, so height is bigger than width. pixels_bounds had no specific setting because I don't understand the purpose.

BaderTim commented 1 month ago

hi there, I encounter a similar error with KITTI-shaped images.

lpiccinelli-eth commented 1 month ago

Thank you for the info, it looks like when out of bounds of the ratio, it fails, I will check it and get back to you (hopefully) the corrected version.

lpiccinelli-eth commented 1 month ago

The error comes from DINO original code and was solved in this PR, we committed the changes and now it should be solved.

Let me know if something is still off.

red-liu commented 1 month ago

It seems the issue has been resolved. Thank you very much for your help. By the way, I've encountered a new problem: there's a significant difference between the intrinsics predictions from version 1 and version 2. Do you have any idea what might be causing this discrepancy? To the same picture, the intrinsics predictions as below:

v2:
    [[[3.8631e+03, 0.0000e+00, 1.5201e+03],
     [0.0000e+00, 4.0359e+03, 2.0109e+03],
     [0.0000e+00, 0.0000e+00, 1.0000e+00]]]

v1:
 [[[1.6742e+03, 0.0000e+00, 1.5174e+03],
     [0.0000e+00, 2.7255e+03, 2.0222e+03],
     [0.0000e+00, 0.0000e+00, 1.0000e+00]]]