hovsg / hovsg.github.io

1 stars 0 forks source link

Problem about the experiment in the paper: how many images did you collect to reconstruct the 3 floors building? #1

Open jeezrick opened 3 months ago

jeezrick commented 3 months ago

Hello, I read your paper and was thoroughly impressed by the results you achieved.

I was curious about the number of images you collected to reconstruct the three-floor building. It looks quite impressive.

Additionally, I'd like to know if you performed this reconstruction online, while the robot dog was exploring, or if it was done offline.

Furthermore, could you please share how you addressed the noise problem in camera pose estimation?

Thank you for your time and consideration.

martinbchnr commented 3 weeks ago

Hi @jeezrick,

We collected approx. 8000 posed frames across the three-floor scene (00862) but consistently skipped 10 frames throughout reconstruction. Our approach for open-vocabulary mapping is so far an offline approach and takes multiple minutes to hours to run on large scenes. Nonetheless, the navigation and retrieval parts can be executed online. Regarding the simulation results, we used ground-truth camera poses. In the real world, we relied on accurate camera poses from LiDAR odometry including pose graph optimization (FAST-LIO2).