noahzn / Lite-Mono

[CVPR2023] Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation
MIT License
540 stars 60 forks source link

Thanks for your work. I tried training it on the monocular endoscopy dataset EndoSALM, but the results were poor. #159

Open wza527 opened 1 week ago

wza527 commented 1 week ago

The color images of the dataset are 320*320, as follows: 0000000 The real depth images of the dataset are saved in 32-bit .png format, as follows: 0000000 The data loader I wrote is as follows: image image image image image My evaluation results were very bad, and the results are as follows: image What do you think is the reason? I changed max_depth to 100

noahzn commented 1 week ago

Hi, I haven't worked on such dataset. What is the actual range of your ground-truth? Maybe 100 is too large for your dataset.

wza527 commented 1 week ago

嗨,我还没有处理过这样的数据集。您的真实值的实际范围是多少?也许 100 对于您的数据集来说太大了。

Thanks for your reply. The actual range of my data set is 0-100mm. The unit of your code parameter is meter. How should I change it appropriately? Thanks again for your quick reply.

noahzn commented 1 week ago

If the range is 0-100mm, please make sure that your evaluation code can correctly deal with this range.

wza527 commented 1 week ago
  1. Where do I need to change the evaluation code to ensure that my code can handle it correctly?
  2. I ran test_simole.py and the results were not good. I used the pre-trained weights of ImageNet from your lite-mono-small, whose input size is 640192, and my data is 320320. Will this affect the training results?
noahzn commented 1 week ago
  1. As you can see here the MIN and MAX are defined in the evaluation code. It can be a problem if your actual ground-truth is not in this range.
  2. Any input size is compatible with the pretrained ImageNet weights.
wza527 commented 1 week ago
  1. 正如您在这里看到的,MIN 和 MAX 在评估代码中定义。如果实际的 ground-truth 不在此范围内,则可能会出现问题。
  2. 任何输入大小都与预训练的 ImageNet 权重兼容。

I have changed it there.I think it's a problem with the training effect. But I don't know what causes the poor training effect.

noahzn commented 1 week ago

If your camera intrinsic is correct... How are the curves and visualizations during training? Have you checked them using tensorboard?

wza527 commented 1 week ago

如果您的相机内在是正确的...训练期间的曲线和可视化效果如何?您是否使用 tensorboard 检查过它们?

The training curve changes are as follows: image The visualization effect is as follows: 0000002_disp 0000002_disp

noahzn commented 1 week ago

The visualization looks not bad. But your evaluation results are totally bad. I think the problem is still from your dataset. I saw several papers using Lite-Mono for endoscopy depth estimation and they reported good results.

wza527 commented 1 week ago

可视化效果看起来还不错。但是你的评估结果完全糟糕。我认为问题仍然出在你的数据集上。我看到几篇使用 Lite-Mono 进行内窥镜深度估计的论文,它们报告了良好的结果。

OK. I am using the public EndoSLAM dataset, so it should be fine. Maybe there is something wrong with the loader I use to process the data. Can you tell me the title of the paper?

noahzn commented 1 week ago

https://scholar.google.com/scholar?hl=en&as_sdt=2005&sciodt=0%2C5&cites=6869562733583933958&scipsc=1&q=endoscopy&btnG=

wza527 commented 1 week ago

Thanks for your help.

noahzn commented 6 days ago

You are welcome. Please update if you find the reason.

wza527 commented 6 days ago

You are welcome. Please update if you find the reason.

OK