nianticlabs / mickey

[CVPR 2024 - Oral] Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences
https://nianticlabs.github.io/mickey/
Other
498 stars 31 forks source link

Customized dataset and potential application to autonomous driving #19

Open XJTU-Haolin opened 1 month ago

XJTU-Haolin commented 1 month ago

Dear authors,

Hi,

Mickey is a great work inspiring features representation, learning, and pose estimation.

I am wondering and trying whether I can apply it to the field of autonomous driving, e.g. visual odometry and localization. Currently, the KITTI odometry dataset (sequential images with relative camera pose) is the one that I want to use for the first try. I also saw someone mentioned training Mickey with RealEstate10K and discussed it with you. However, I did not catch the "unscaled dataset" and still don't know why translation loss is unworkable in that case. Could you give more explanation?

You have great experience and knowledge in this area. For trying the customized KITTI odometry dataset, could you give me some suggestions before I start? Is there any efficient way to revise the original dataloader?

Thanks for your time!

Sincerely, Haolin

XJTU-Haolin commented 1 month ago

I directly exchanged DINOV2 to ResNET50(from torchvision.models)using resnet output from 1/16 downscale layers, and the accuracy looks bad. Is there anything I should pay attention?