Mickey is a great work inspiring features representation, learning, and pose estimation.
I am wondering and trying whether I can apply it to the field of autonomous driving, e.g. visual odometry and localization. Currently, the KITTI odometry dataset (sequential images with relative camera pose) is the one that I want to use for the first try. I also saw someone mentioned training Mickey with RealEstate10K and discussed it with you. However, I did not catch the "unscaled dataset" and still don't know why translation loss is unworkable in that case. Could you give more explanation?
You have great experience and knowledge in this area. For trying the customized KITTI odometry dataset, could you give me some suggestions before I start? Is there any efficient way to revise the original dataloader?
I directly exchanged DINOV2 to ResNET50(from torchvision.models)using resnet output from 1/16 downscale layers, and the accuracy looks bad. Is there anything I should pay attention?
Dear authors,
Hi,
Mickey is a great work inspiring features representation, learning, and pose estimation.
I am wondering and trying whether I can apply it to the field of autonomous driving, e.g. visual odometry and localization. Currently, the KITTI odometry dataset (sequential images with relative camera pose) is the one that I want to use for the first try. I also saw someone mentioned training Mickey with RealEstate10K and discussed it with you. However, I did not catch the "unscaled dataset" and still don't know why translation loss is unworkable in that case. Could you give more explanation?
You have great experience and knowledge in this area. For trying the customized KITTI odometry dataset, could you give me some suggestions before I start? Is there any efficient way to revise the original dataloader?
Thanks for your time!
Sincerely, Haolin