Open LSK0821 opened 1 month ago
Hi @LSK0821 ,
Question 1: Naturally, SPSR introduces some errors. We ablated these errors in relation to the depth of octree used in SPSR. The number in the mesh name (e.g., AC13) marks the octree depth used in SPSR. Larger octree depth results in more detailed mesh. You can find the parameters of the generated meshes in Tab. 1. The ablation in Tab. 3 uses real images and rendered depth maps so the differences between the columns in the table are coming purely from the different SPSR octree depths. We did not do any ablation using other mesh reconstruction algorithms or changing other parameters of SPSR.
Question 2: The holes in the mesh result in 2D2D correspondence not being lifted to 2D3D correspondence, so those are not used for localization. The distorted areas can lead to inaccurate 2D3D correspondences. We assume there are large parts of the depth maps that are still valid, resulting in many consistent and accurate 2D3D correspondences. The inaccurate correspondences are then filtered out in pose estimation RANSAC as it picks the minimal sample with the most inliers (which are the consistent and accurate 2D3D correspondences).
Question 3: Both pipelines come with their own sets of error sources. SfM point clouds can contain inaccuracies due to errors in local feature extraction and matching. The accuracy of mesh is influenced by errors in the estimated camera parameters (in our case coming from SfM) and the errors introduced during MVS and SPSR. It is not surprising to me that the different sources of error can in some cases result in better localization accuracy when using a mesh.
Thank you very much for your explanation! I still have some points of confusion as follows:
Dear tsattler and v-pnk, First, I would like to express my admiration for your impressive work on the MeshLoc project. As a researcher working on visual relocalization, I find your project to be incredibly insightful and valuable.
After reading the MeshLoc code and papers, I have some questions. Question 1: MeshLoc mentions that 3D meshes are generated using the SPSR reconstruction algorithm based on the dense point clouds from MVS. Since SPSR tends to produce smooth surfaces, this may result in the loss of original surface details . Is it accurate to use the depth map rendered by OpenGL to recover 3D coordinates under these circumstances? Question 2: Even the best 3D mesh reconstruction algorithms face challenges related to geometric accuracy, such as holes and distortions. When I opened the dataset /2022MeshLoc/aachen_day_night_v11/meshes/AC13_colored.ply in MeshLab, I observed many holes and distorted areas. In these regions, using OpenGL-rendered depth maps introduces significant errors when recovering 3D coordinates. How should visual localization handle such issues? Question 3: In contrast, sparse SfM point clouds inherently provide 3D coordinates, avoiding potential errors introduced by mesh reconstruction and depth map rendering. However, in some datasets, the localization accuracy of sparse point clouds is even lower. What might be the reason for this lower accuracy?
Thank you so much for your time and for sharing such important work with the community. I truly appreciate any assistance you can provide.