yzcjtr / GeoNet

Code for GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose (CVPR 2018)
MIT License
726 stars 181 forks source link

How to reproduce the pose results #25

Closed haofeixu closed 6 years ago

haofeixu commented 6 years ago

Hi, @yzcjtr , great work!

I can reproduce the depth results as reported in your paper, but the pose results is not very well, so let me double check the training setting: the pose task is trained on kitti odometry dataset, and the sequence length is 5. Do you use pre-trained dispnet or just train from scratch? Is there anything I am missing?

I am also wondering why your pose results can be so well, even outperform other papers with more constraints (like https://arxiv.org/abs/1802.05522 with 3D constraints), is there any insight you can share? What's the key components for pose task here? Thanks!

yzcjtr commented 6 years ago

Hi @haofeixu , thanks for your interest. We didn't pretrain dispnet for the visual odometry experiment. How long did you train the network for? Our result is obtained at ~206k iters.

As for the paper you mentioned, I'd point out that the key to pose task is handling of dynamic objects, where some constraints may not be applicable. Also, learning pose from monocular videos suffers from scale ambiguity issue (I would suggest you using scale normalization for more stable training). The work you referred to is impressive, but I think the performance difference is marginal in terms of pose task.

haofeixu commented 6 years ago

Thanks for your reply @yzcjtr . After processing the odometry dataset, I got 18346 images for training, and 2027 images for validation, so in total 20373 images, is this consistent with your dataset? I trained for ~120k iters using scale normalization, maybe I should train longer.

yzcjtr commented 6 years ago

Yes, the dataset is consistent.

haofeixu commented 6 years ago

Thank you very much!