Open noobtoob4lyfe opened 3 months ago
Which model are you using?
Many stereo-based GT (HRWSI, CityScapes, or even BlendedMVS) give to the sky region really close-by values and sensor-based datasets never have GT on the sky, in addition, we do not model sky in any way, i.e. external segmentation model to set an arbitrarily large value, etc... All these combined lead to the model being extremely uncertain on sky regions.
There are different ways you can try:
Anyway, I see that the output is extremely blurry, are you using the infer
method or using similar resizing methods?
An OOD image shape leads to quite degraded performance, especially for ViT-based architecture. As a general rule of thumb, for geometric tasks, it is better to go for shorter edge-based resizing and padding to fit the aspect ratio given in the config or follow the original papers' implementation details. For instance, ZoeDepth or DepthAnything implementations include brute-forcefully resizing the input image to a given and fixed shape, i.e. modifying the original aspect ratio.
Thanks for your reply. I'm using this onnx implementation with the model he linked on huggingface. https://github.com/ibaiGorordo/ONNX-Unidepth-Monocular-Metric-Depth-Estimation Would I have better luck with the main implementation you think? Thanks for your suggestions.
I think trying the GPU model from this repo would be better first. If you see the same artifacts, then the problem is from the model itself. AS I think that some tiny operational mismatch might have been introduced when "forcing" to be compliant to onnx.
Thanks for sharing your great work.
I'm encountering an issue with outdoor shots that show the sky. Some frames will place the sky in the foreground and some will not, causing huge temporal inconsistency. Is there any way to get it to ignore the sky?