hisfog / SfMNeXt-Impl

[AAAI 2024] Official implementation of "SQLdepth: Generalizable Self-Supervised Fine-Structured Monocular Depth Estimation", and more.
MIT License
85 stars 12 forks source link

Question about disp map #29

Closed Shaw-Way closed 9 months ago

Shaw-Way commented 9 months ago

Hi, author, thanks for your remarkable work. I attempted training with the following settings. --dataset kitti --eval_split eigen --height 192 --width 640 --batch_size 16 --num_epochs 25 --model_dim 32 --patch_size 16 --query_nums 120 --scheduler_step_size 15 --eval_mono --post_process --min_depth 0.001 --max_depth 80.0 --backbone resnet18_lite As training progresses, the loss gradually decreases, and various metrics show improvement. However, the obtained disp maps looks strange. Here are disp maps obtained at the seventh epoch. image It seems to render nearby objects in dark colors and distant objects in bright colors. Is this normal?

hisfog commented 9 months ago

Yes, that's expected, and disp map here is depth map actually.

Shaw-Way commented 9 months ago

Why does the code output a depth map instead of an inverse depth map, like other self-supervised monocular depth estimation models? I don't quite understand because the computation of the loss doesn't seem to change much.

hisfog commented 9 months ago

Emm, two main reasons: one is to accommodate depth bins, and the other is for supervised fine-tuning, as the output of most supervised methods is depth.

Shaw-Way commented 9 months ago

https://github.com/hisfog/SfMNeXt-Impl/blob/bbc5bb989e51eae64848524450404eb6995fef40/trainer.py#L400 Thank you for your prompt reply. I noticed that you commented out this line and directly used the network's output to reconstruct the traget image. I think this is the reason why your model directly outputs a depth map