Open sami-automatic opened 3 years ago
I've been thinking about the same, so I wont create new issue. I've been reading a lot of issues over this repository and DPT, as Im planning on training DPT, and paper. As I understand training happens in disparity space, and as it was mentioned in one of the issues disparity is proportional to inverse depth.
[0, 255]
. To convert it to inverse depth I would have to simply 1.0 / D
, which makes it proportional to disparity and in [0, 1]
range. After it this dataset will be good to go for training?[0, 1]
?1
?I've been thinking about the same, so I wont create new issue. I've been reading a lot of issues over this repository and DPT, as Im planning on training DPT, and paper. As I understand training happens in disparity space, and as it was mentioned in one of the issues disparity is proportional to inverse depth.
- So, for example I have depth dataset from game engine which provides depth encoded in range
[0, 255]
. To convert it to inverse depth I would have to simply1.0 / D
, which makes it proportional to disparity and in[0, 1]
range. After it this dataset will be good to go for training?- If we have disparity dataset, we do the same conversion to scale dataset to range
[0, 1]
?- If we have SfM dataset we do same step with motivation from
1
?
Hi, Did you find a workaround about how to "shift and scale the ground-truth disparity to the range [0, 1] for all datasets". I want to reproduce DPT(MidaS 3.0) too and do not know how to proprocess the datasets.
Same problem
我一直在考虑同样的问题,所以我不会创建新的问题。我已经阅读了很多关于这个存储库和 DPT 的问题,因为我正在计划培训 DPT 和论文。据我了解,训练发生在视差空间中,正如其中一个问题中提到的,视差与深度成反比。
- 因此,例如,我有来自游戏引擎的深度数据集,它提供了 range 编码的深度
[0, 255]
。要将其转换为逆深度,我必须简单地进行转换1.0 / D
,这使得它与视差和范围成正比[0, 1]
。之后这个数据集适合用于训练吗?- 如果我们有视差数据集,我们会进行相同的转换以将数据集缩放到范围
[0, 1]
?- 如果我们有 SfM 数据集,我们会出于以下动机执行相同的步骤
1
?您好, 您是否找到了有关如何“将所有数据集的地面实况差异转移和缩放到 [0, 1] 范围”的解决方法。 我也想重现 DPT(MidaS 3.0),但不知道如何处理数据集。
hi, do you know how to train metric depth dataset(DIML) and relative depth dataset(RedWeb) together? I have the same question.
@puyiwen Hi, maybe I can answer this question. I do train my model by using both metric depth and relative depth dataset. The way is simple. For metric depth, I just inverse it by 1/depth then scale to [0,1]. For disparity data I just scale to [0,1] without inverse. Accually, I don't really care the focal length and baseline, because I always scale it to [0,1]. So the only thing i need to know is should I inverse the data before scaling? If the label provied by depth format, I need to inverse it. If the label provided by disparity, I just scale it.
@CJCHEN1230 Thank you for your reply. I convert all the metric depth to relative depth with depth-anything big pretrained model, and I scale all relative depth to [0-1] to train. Will the effect be much different between what I do and what you do?
@CJCHEN1230 Thank you for your reply. I convert all the metric depth to relative depth with depth-anything big pretrained model, and I scale all relative depth to [0-1] to train. Will the effect be much different between what I do and what you do?
hello I do the same thing , but when I use the original [0,255] grayscale img as label, it works . when I scale to [0,1], it can not predict reletive depth.
@puyiwen I also make some labels from depth anything, I just scale it from 0 to 1. So I think we do the same thing.
@Feobi1999
Which loss function do you use? The Trimmed MAE or the SSI MSE? Based on my experience, one of the reasons may be your regularization settings. Maybe you can set the regularization term to zero to check if it works, then tune it to find a better setting.
Thank you for your contributions and for your amazing work.
Your paper mentions that you perform prediction in disparity space in order to handle representation, scale and shift ambiguities on multiple datasets, however I could not figure out how you convert ground truths depths into disparity maps before applying your loss functions.
I know the depth disparity relation as the following form: depth = (baseline focal length) / disparity thus for calculating disparity: disparity = (baseline focal length) / depth but how do I decide on baseline and focal length parameters? (For instance DIML-Indoor ground truths are provided as 16bit png depths, how to convert it to disparity space before feeding it to the loss?)
Moreover the paper mentions "We shift and scale the ground-truth disparity to the range [0, 1] for all datasets.". Is this based on across all datasets statistics or according to a fixed range?
Thank you for your guidance and precious time.