Closed haofeixu closed 6 years ago
Hi @haofeixu , thanks for your interest. We didn't pretrain dispnet for the visual odometry experiment. How long did you train the network for? Our result is obtained at ~206k iters.
As for the paper you mentioned, I'd point out that the key to pose task is handling of dynamic objects, where some constraints may not be applicable. Also, learning pose from monocular videos suffers from scale ambiguity issue (I would suggest you using scale normalization for more stable training). The work you referred to is impressive, but I think the performance difference is marginal in terms of pose task.
Thanks for your reply @yzcjtr . After processing the odometry dataset, I got 18346 images for training, and 2027 images for validation, so in total 20373 images, is this consistent with your dataset? I trained for ~120k iters using scale normalization, maybe I should train longer.
Yes, the dataset is consistent.
Thank you very much!
Hi, @yzcjtr , great work!
I can reproduce the depth results as reported in your paper, but the pose results is not very well, so let me double check the training setting: the pose task is trained on kitti odometry dataset, and the sequence length is 5. Do you use pre-trained dispnet or just train from scratch? Is there anything I am missing?
I am also wondering why your pose results can be so well, even outperform other papers with more constraints (like https://arxiv.org/abs/1802.05522 with 3D constraints), is there any insight you can share? What's the key components for pose task here? Thanks!