yzcjtr / GeoNet

Code for GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose (CVPR 2018)
MIT License
726 stars 181 forks source link

Best performance with depth model (seq. len. 3) than pose model (seq. len. 5) in camera pose estimation. #49

Closed jmendozais closed 5 years ago

jmendozais commented 5 years ago

Thanks for your impressive work. I have some questions. I tested the pre-trained depth model on the camera pose estimation task and I got 0.0087 ATE with 0.0053 std, which is much better than the pre-trained pose model. Moreover, I checked the trajectory on sequence 9 of KITTI and the result is better as well. Why it was not relevant to be reported?

yzcjtr commented 5 years ago

We used different splits for our different tasks, though all of them were performed on the KITTI dataset. So it's improper to use the pretrained depth model for testing its pose performance because it saw the test data of camera pose experiment during training.

jmendozais commented 5 years ago

Thank you for your response, thats correct.

However, In the same fashion I trained a GeoNet model with sequences of length 3, using the odometry task splits (00-08 sequences) for training and I got better results than the official results. I obtained an 0.0090 of ATE with 0.0045 std training 325k iterations. The complete trajectory on the sequences 09 is better as well. Moreover I compared both results with the metrics proposed on the KITTI dataset site and the observation remains. I understand that using a sequence length of 5 is good for a fair comparison with Zhou's work. But my comparison suggest that a model with seq length of 3 is better. Maybe I am wrong. I would like to know your toughs about these observations.

Thanks in advance,