ifnspaml / SGDepth

[ECCV 2020] Self-Supervised Monocular Depth Estimation: Solving the Dynamic Object Problem by Semantic Guidance
MIT License
200 stars 26 forks source link

How to eval depth with Cityscapes #18

Open ChengJianjia opened 3 years ago

ChengJianjia commented 3 years ago

I want to eval depth on Cityscapes datasets ,What parameters should I pay attention to? Thanks

klingner commented 3 years ago

Hi,

so mainly Cityscapes has a different aspect ratio than KITTI, so this is something I would look out for, when resizing the Cityscapes images. It might be possible to pass them through the network in the same resolution as the KITTI images, but with a significantly altered aspect ratio. Alternatively, the network is fully convolutional, so you may also try to pass the Cityscapes images with an unaltered aspect ratio through the network.

If you also want to evaluate the depth performance on Cityscapes then it is important to note that Cityscapes only provides ground truth depth maps calculated from Multi-Vew Stereo, so be careful when drawing conclusions between the performance on KITTI/Cityscapes. Also, these depth maps have a higher coverage of the image pixels than the KITTI depth maps (only sparse LiDAR beams). Also, the images on the Cityscapes dataset are stored in a slightly different format than the KITTI images (I think it is described in the README of Cityscapes somewhere).

Hope this helps!

ChengJianjia commented 3 years ago

你好,

因此,主要是Cityscapes的纵横比与KITTI不同,因此在调整Cityscapes图像的大小时,我会注意这一点。它们可能以与KITTI图像相同的分辨率通过网络传输,但纵横比却发生了显着变化。另外,网络是完全卷积的,因此您也可以尝试将纵横比不变的Cityscapes图像通过网络。

如果您还想评估Cityscapes的深度性能,则需要注意的是Cityscapes仅提供根据Multi-Vew Stereo计算的地面真实深度图,因此在得出关于KITTI / Cityscapes的性能之间的结论时请务必小心。而且,这些深度图比KITTI深度图(仅稀疏LiDAR光束)具有更高的图像像素覆盖率。同样,Cityscapes数据集上的图像以与KITTI图像略有不同的格式存储(我认为它在《 Cityscapes的自述文件》中有所描述)。

希望这可以帮助!

Hi,

so mainly Cityscapes has a different aspect ratio than KITTI, so this is something I would look out for, when resizing the Cityscapes images. It might be possible to pass them through the network in the same resolution as the KITTI images, but with a significantly altered aspect ratio. Alternatively, the network is fully convolutional, so you may also try to pass the Cityscapes images with an unaltered aspect ratio through the network.

If you also want to evaluate the depth performance on Cityscapes then it is important to note that Cityscapes only provides ground truth depth maps calculated from Multi-Vew Stereo, so be careful when drawing conclusions between the performance on KITTI/Cityscapes. Also, these depth maps have a higher coverage of the image pixels than the KITTI depth maps (only sparse LiDAR beams). Also, the images on the Cityscapes dataset are stored in a slightly different format than the KITTI images (I think it is described in the README of Cityscapes somewhere).

Hope this helps!

Thanks for reply. I have eval depth on cityscapes,but its performance seems not good.I will try to train it on cityscapes.

klingner commented 3 years ago

Thank you for sharing your initial results,

maybe one thought: I would not directly compare a performance metric between KITTI and Cityscapes due to the large structural difference in the ground truths. Have you looked at some images and tried to qualitatively judge, how the results look like? This might help in the evaluation process. Training directyl on Cityscapes is, however, likely to improve performance with the right set of hyperparameters.

ChengJianjia commented 3 years ago

Thank you for sharing your initial results,

maybe one thought: I would not directly compare a performance metric between KITTI and Cityscapes due to the large structural difference in the ground truths. Have you looked at some images and tried to qualitatively judge, how the results look like? This might help in the evaluation process. Training directyl on Cityscapes is, however, likely to improve performance with the right set of hyperparameters.

Thanks for reply. I have used Inference. py to output the depth prediction image of the Cityscapes dataset .The output image looks not bad. Then I used two models provided in the code to eval on Cityscapes , and the results are as follows: depth_full.pth: {'abs_rel': 0.9437683314528279, 'sq_rel': 46.842625193553495, 'rmse': 13.539455869561722, 'rmse_log': 0.5114394837606806} {'delta1': 0.46745538016478827, 'delta2': 0.7847414059078959, 'delta3': 0.9212443854481077} depth_only.pth: {'abs_rel': 0.838633564325633, 'sq_rel': 36.69276755167625, 'rmse': 13.141963201280008, 'rmse_log': 0.5032015791397804} {'delta1': 0.4857175435314797, 'delta2': 0.7851770513177866, 'delta3': 0.9191037496773082} And It's strange that depth_only.pth performs better than depth_full.pth About the hyperparameters,I made three changes: 1.about the mask_fn and clamb_fn , 2.in mytransform.py, when generate depth_gt,I used this line of code : sample[key][sample[key] > 1.0] = 0.209313 2262.52/((np.array(sample[key][sample[key] > 1.0]).astype(np.float) - 1.0) / 256.) 3,Input image resize to 256512

I don't know if there are any other parameters that need to be changed.

klingner commented 3 years ago

Well, I think at the moment, there is not really a standard on how to evaluate on Cityscapes. Some thoughts regarding your changes:

  1. It could make sense to exclude the region in the bottom of the Cityscapes images with the car from the evaluation. Maybe this is already considered by you in the mask_fn?
  2. This line of code seems correct to me. I would do it the same way
  3. I am not sure, if this is the optimal size, but it seems close enough to the KITTI image size to yield meaningful results.

Although I did not observe the depth-only model to be better than the depth-full model before, it might be that due to the domain shift, the results differ from the ones obtained on KITTI. In the end, the model is still optimized for operation on KITTI, so it might also be interesting, if a training of both models on Cityscapes also yields such a relation.

ZhuYingJessica commented 2 years ago

@ChengJianjia Hi, how do you compute the errors on Cityscapes, could you share a link to GT depth data of Cityscapes dataset? Thanks a lot!