Open jeezrick opened 3 months ago
Hi @jeezrick,
We collected approx. 8000 posed frames across the three-floor scene (00862) but consistently skipped 10 frames throughout reconstruction. Our approach for open-vocabulary mapping is so far an offline approach and takes multiple minutes to hours to run on large scenes. Nonetheless, the navigation and retrieval parts can be executed online. Regarding the simulation results, we used ground-truth camera poses. In the real world, we relied on accurate camera poses from LiDAR odometry including pose graph optimization (FAST-LIO2).
Hello, I read your paper and was thoroughly impressed by the results you achieved.
I was curious about the number of images you collected to reconstruct the three-floor building. It looks quite impressive.
Additionally, I'd like to know if you performed this reconstruction online, while the robot dog was exploring, or if it was done offline.
Furthermore, could you please share how you addressed the noise problem in camera pose estimation?
Thank you for your time and consideration.