lpiccinelli-eth / UniDepth

Universal Monocular Metric Depth Estimation
Other
643 stars 52 forks source link

Can't do sky? #69

Open noobtoob4lyfe opened 3 months ago

noobtoob4lyfe commented 3 months ago

Thanks for sharing your great work.
I'm encountering an issue with outdoor shots that show the sky. Some frames will place the sky in the foreground and some will not, causing huge temporal inconsistency. Is there any way to get it to ignore the sky?
image

lpiccinelli-eth commented 3 months ago

Which model are you using?

Many stereo-based GT (HRWSI, CityScapes, or even BlendedMVS) give to the sky region really close-by values and sensor-based datasets never have GT on the sky, in addition, we do not model sky in any way, i.e. external segmentation model to set an arbitrarily large value, etc... All these combined lead to the model being extremely uncertain on sky regions.

There are different ways you can try:

  1. Hopefully the sky regions are considered low-confidence regions, so you can use confidence to mask them out
  2. Use an external segmentation model: segmentation models targeted to the sky only are pretty efficient and fast, for instance, you do not need (grounded) SAM, or similar foundation models, to do it.

Anyway, I see that the output is extremely blurry, are you using the infer method or using similar resizing methods? An OOD image shape leads to quite degraded performance, especially for ViT-based architecture. As a general rule of thumb, for geometric tasks, it is better to go for shorter edge-based resizing and padding to fit the aspect ratio given in the config or follow the original papers' implementation details. For instance, ZoeDepth or DepthAnything implementations include brute-forcefully resizing the input image to a given and fixed shape, i.e. modifying the original aspect ratio.

noobtoob4lyfe commented 3 months ago

Thanks for your reply. I'm using this onnx implementation with the model he linked on huggingface. https://github.com/ibaiGorordo/ONNX-Unidepth-Monocular-Metric-Depth-Estimation Would I have better luck with the main implementation you think? Thanks for your suggestions.

lpiccinelli-eth commented 3 months ago

I think trying the GPU model from this repo would be better first. If you see the same artifacts, then the problem is from the model itself. AS I think that some tiny operational mismatch might have been introduced when "forcing" to be compliant to onnx.