Questions about retrieval stage

v-pnk / cadloc

Benchmark for visual localization on imperfect 3D mesh models from the Internet

https://v-pnk.github.io/cadloc/

BSD 3-Clause "New" or "Revised" License

23 stars 1 forks source link

Questions about retrieval stage #1

Closed RUiN-jiarun closed 1 year ago

RUiN-jiarun commented 1 year ago

Thanks for sharing the benchmark!
I've got a question on your MeshLoc visual localization process. In the paper you've mentioned that if we only have a 3D model of the scene and no database images, we use the rendered images as the database images. So the image retrieval stage used the rendered images rather than real images? I wonder how this method performances since low textured CAD model may lead to bad retrieval results.

v-pnk commented 1 year ago

Thank you for your interest in our work! The results presented in the paper use synthetically generated database camera poses (Fig. 5a) for which we don't have real images, just the renderings. Therefore the retrieval stage for each real query image searches for the most similar rendered database image. Low fidelity models influence both the retrieval and matching stage, but estimating which model is more suitable for a deep-learning global descriptor (AP-GeM in our case) based on how the model looks is not that straightforward.

When taking the pose of the single best retrieved database image: notre_dame_ap-gem_best_of_1

When taking the best database pose out of 20 retrieved (using oracle): notre_dame_ap-gem_best_of_20

The CAD models for reference:

Used AP-GeM checkpoint: Resnet101-AP-GeM-LM18

RUiN-jiarun commented 1 year ago

Thanks for your reply! However the result in paper (Fig. 4(a)) shows a better performance than that in top-20 retrieved best pose
So what is the configuration of MeshLoc to represent the results in paper? For example get a 99% success rate on achieving less than 10% DCRE error as you've presented in the paper

v-pnk commented 1 year ago

The plots in my previous comment use poses of the retrieved database images without the subsequent P3P pose estimation.

The plots from the paper use the retrieval only to get shortlist of database images, which are then used for local feature matching and P3P pose estimation.

RUiN-jiarun commented 1 year ago

Oh I got it. Thanks a lot!