Closed RUiN-jiarun closed 1 year ago
Thank you for your interest in our work! The results presented in the paper use synthetically generated database camera poses (Fig. 5a) for which we don't have real images, just the renderings. Therefore the retrieval stage for each real query image searches for the most similar rendered database image. Low fidelity models influence both the retrieval and matching stage, but estimating which model is more suitable for a deep-learning global descriptor (AP-GeM in our case) based on how the model looks is not that straightforward.
When taking the pose of the single best retrieved database image:
When taking the best database pose out of 20 retrieved (using oracle):
The CAD models for reference:
Used AP-GeM checkpoint: Resnet101-AP-GeM-LM18
Thanks for your reply! However the result in paper (Fig. 4(a))
shows a better performance than that in top-20 retrieved best pose
So what is the configuration of MeshLoc to represent the results in paper? For example get a 99% success rate on achieving less than 10% DCRE error as you've presented in the paper
The plots in my previous comment use poses of the retrieved database images without the subsequent P3P pose estimation.
The plots from the paper use the retrieval only to get shortlist of database images, which are then used for local feature matching and P3P pose estimation.
Oh I got it. Thanks a lot!
Thanks for sharing the benchmark!
I've got a question on your MeshLoc visual localization process. In the paper you've mentioned that if we only have a 3D model of the scene and no database images, we use the rendered images as the database images. So the image retrieval stage used the rendered images rather than real images? I wonder how this method performances since low textured CAD model may lead to bad retrieval results.