isl-org / MiDaS

Code for robust monocular depth estimation described in "Ranftl et. al., Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer, TPAMI 2022"
MIT License
4.53k stars 632 forks source link

Converting GT labels into Disparity Space #138

Open sami-automatic opened 3 years ago

sami-automatic commented 3 years ago

Thank you for your contributions and for your amazing work.

Your paper mentions that you perform prediction in disparity space in order to handle representation, scale and shift ambiguities on multiple datasets, however I could not figure out how you convert ground truths depths into disparity maps before applying your loss functions.

I know the depth disparity relation as the following form: depth = (baseline focal length) / disparity thus for calculating disparity: disparity = (baseline focal length) / depth but how do I decide on baseline and focal length parameters? (For instance DIML-Indoor ground truths are provided as 16bit png depths, how to convert it to disparity space before feeding it to the loss?)

Moreover the paper mentions "We shift and scale the ground-truth disparity to the range [0, 1] for all datasets.". Is this based on across all datasets statistics or according to a fixed range?

Thank you for your guidance and precious time.

InfiniteLife commented 3 years ago

I've been thinking about the same, so I wont create new issue. I've been reading a lot of issues over this repository and DPT, as Im planning on training DPT, and paper. As I understand training happens in disparity space, and as it was mentioned in one of the issues disparity is proportional to inverse depth.

  1. So, for example I have depth dataset from game engine which provides depth encoded in range [0, 255]. To convert it to inverse depth I would have to simply 1.0 / D , which makes it proportional to disparity and in [0, 1] range. After it this dataset will be good to go for training?
  2. If we have disparity dataset, we do the same conversion to scale dataset to range [0, 1]?
  3. If we have SfM dataset we do same step with motivation from 1?
Twilight89 commented 1 year ago

I've been thinking about the same, so I wont create new issue. I've been reading a lot of issues over this repository and DPT, as Im planning on training DPT, and paper. As I understand training happens in disparity space, and as it was mentioned in one of the issues disparity is proportional to inverse depth.

  1. So, for example I have depth dataset from game engine which provides depth encoded in range [0, 255]. To convert it to inverse depth I would have to simply 1.0 / D , which makes it proportional to disparity and in [0, 1] range. After it this dataset will be good to go for training?
  2. If we have disparity dataset, we do the same conversion to scale dataset to range [0, 1]?
  3. If we have SfM dataset we do same step with motivation from 1?

Hi, Did you find a workaround about how to "shift and scale the ground-truth disparity to the range [0, 1] for all datasets". I want to reproduce DPT(MidaS 3.0) too and do not know how to proprocess the datasets.

CJCHEN1230 commented 1 year ago

Same problem

puyiwen commented 6 months ago

我一直在考虑同样的问题,所以我不会创建新的问题。我已经阅读了很多关于这个存储库和 DPT 的问题,因为我正在计划培训 DPT 和论文。据我了解,训练发生在视差空间中,正如其中一个问题中提到的,视差与深度成反比。

  1. 因此,例如,我有来自游戏引擎的深度数据集,它提供了 range 编码的深度[0, 255]。要将其转换为逆深度,我必须简单地进行转换1.0 / D,这使得它与视差和范围成正比[0, 1]。之后这个数据集适合用于训练吗?
  2. 如果我们有视差数据集,我们会进行相同的转换以将数据集缩放到范围[0, 1]
  3. 如果我们有 SfM 数据集,我们会出于以下动机执行相同的步骤1

您好, 您是否找到了有关如何“将所有数据集的地面实况差异转移和缩放到 [0, 1] 范围”的解决方法。 我也想重现 DPT(MidaS 3.0),但不知道如何处理数据集。

hi, do you know how to train metric depth dataset(DIML) and relative depth dataset(RedWeb) together? I have the same question.

CJCHEN1230 commented 6 months ago

@puyiwen Hi, maybe I can answer this question. I do train my model by using both metric depth and relative depth dataset. The way is simple. For metric depth, I just inverse it by 1/depth then scale to [0,1]. For disparity data I just scale to [0,1] without inverse. Accually, I don't really care the focal length and baseline, because I always scale it to [0,1]. So the only thing i need to know is should I inverse the data before scaling? If the label provied by depth format, I need to inverse it. If the label provided by disparity, I just scale it.

puyiwen commented 6 months ago

@CJCHEN1230 Thank you for your reply. I convert all the metric depth to relative depth with depth-anything big pretrained model, and I scale all relative depth to [0-1] to train. Will the effect be much different between what I do and what you do?

Feobi1999 commented 2 months ago

@CJCHEN1230 Thank you for your reply. I convert all the metric depth to relative depth with depth-anything big pretrained model, and I scale all relative depth to [0-1] to train. Will the effect be much different between what I do and what you do?

hello I do the same thing , but when I use the original [0,255] grayscale img as label, it works . when I scale to [0,1], it can not predict reletive depth.

CJCHEN1230 commented 3 hours ago

@puyiwen I also make some labels from depth anything, I just scale it from 0 to 1. So I think we do the same thing.

CJCHEN1230 commented 2 hours ago

@Feobi1999
Which loss function do you use? The Trimmed MAE or the SSI MSE? Based on my experience, one of the reasons may be your regularization settings. Maybe you can set the regularization term to zero to check if it works, then tune it to find a better setting.