Closed Shaw-Way closed 9 months ago
Yes, that's expected, and disp map here is depth map actually.
Why does the code output a depth map instead of an inverse depth map, like other self-supervised monocular depth estimation models? I don't quite understand because the computation of the loss doesn't seem to change much.
Emm, two main reasons: one is to accommodate depth bins, and the other is for supervised fine-tuning, as the output of most supervised methods is depth.
https://github.com/hisfog/SfMNeXt-Impl/blob/bbc5bb989e51eae64848524450404eb6995fef40/trainer.py#L400 Thank you for your prompt reply. I noticed that you commented out this line and directly used the network's output to reconstruct the traget image. I think this is the reason why your model directly outputs a depth map
Hi, author, thanks for your remarkable work. I attempted training with the following settings. --dataset kitti --eval_split eigen --height 192 --width 640 --batch_size 16 --num_epochs 25 --model_dim 32 --patch_size 16 --query_nums 120 --scheduler_step_size 15 --eval_mono --post_process --min_depth 0.001 --max_depth 80.0 --backbone resnet18_lite As training progresses, the loss gradually decreases, and various metrics show improvement. However, the obtained disp maps looks strange. Here are disp maps obtained at the seventh epoch. It seems to render nearby objects in dark colors and distant objects in bright colors. Is this normal?