tsattler / visuallocalizationbenchmark

341 stars 58 forks source link

Active Search is not working effectively #69

Open seungjuuuuuu opened 9 months ago

seungjuuuuuu commented 9 months ago

When I got the KingsCollege 3D scene file in Cambridge Landmarks, I experimented with the steps described in redeme.txt in ACG-Localizer, but the acg_localizer_active_search output had a large rotation and position error and showed that 0 images were registered. I sincerely ask why is that?

tsattler commented 9 months ago

Because the original code is computing the pose together with the intrinsics. You get much more accurate and stable poses when using a calibrated camera.

seungjuuuuuu commented 9 months ago

Thank you for your prompt response. Regarding your suggestion to use a calibrated camera, could you please clarify at which stage this is recommended? Is it typically provided during scene modeling in Colmap? My research focuses on testing methodologies using datasets like Cambridge Landmarks and 7Scenes. However, based on my understanding, these datasets do not seem to include camera intrinsic parameters. If I intend to proceed with testing the Active Search method on these datasets, what steps should I take?

seungjuuuuuu commented 9 months ago

Because the original code is computing the pose together with the intrinsics. You get much more accurate and stable poses when using a calibrated camera.

Sorry to bother you again, but I still can't resolve the above issue

In the past few days, I used Colmap to convert .bin format files to bundler's output format, and the list.txt file has the following format, for example: ./seq1_frame00016.jpg 0 1665.30750 ./seq1_frame00015.jpg 0 1665.65062 ./seq1_frame00012.jpg 0 1666.12875 I wonder if what you said about using a calibrated camera is reflected in this step.

Then I followed the steps prompted by ACG-Localizer.To ensure that a calibrated camera is used, I even generated a list.txt with the following format for the query image: ./frame00001.jpg 0 1676.75098 ./frame00002.jpg 0 1676.77588

Although the above steps were executed successfully, it still shows that 0 images have been registered. I even looked at the code of acg_localizer_active_search to check if the function can provide camera intrinsics. But none of the above solutions solved my problem. So I'm asking for your help again and I'd be grateful if you could reply!

tsattler commented 9 months ago

You can get Colmap models for Cambridge and 7Scenes here: https://github.com/cvg/Hierarchical-Localization/tree/master/hloc/pipelines/Cambridge and here https://github.com/cvg/Hierarchical-Localization/tree/master/hloc/pipelines/7Scenes . The intrinsics of the queries should be included there as well.

The original bundler file specification assumes that the coordinate system of each image follows the Computer Graphics convention (x-axis points to the right, y-axis points upwards, camera is looking down the -z-axis). The format generated by Colmap follows the Computer Vision convention (x-axis points to the right, y-axis points downwards, camera is looking down the -z-axis). I'd assume that this causes problems. A description on how to convert between the formats can be found here: https://data.ciirc.cvut.cz/public/projects/2020VisualLocalization/Aachen-Day-Night/README_Aachen-Day-Night.md

I don't remember whether the ACM Localizer code supports calibrated cameras or not. I wrote this more than 11 years ago.

seungjuuuuuu commented 9 months ago

Thank you very much for your reply! I will follow your suggestion for further testing.

Additionally, may I ask if you have the test results for Active Search on each query image for every scene in Cambridge Landmarks and 7Scenes? My experiments are specifically aimed at obtaining these test results, rather than focusing on the median error for each scene. If you could share them with me, I would be extremely grateful.

tsattler commented 9 months ago

Poses per image for 7Scenes and 12Scenes can be found in this repository: https://github.com/tsattler/visloc_pseudo_gt_limitations

Poses for Cambridge Landmarks are here: https://drive.google.com/file/d/1xY459_o7XFLAtrhK_i8Kqbn9UZS50pKc/view?usp=sharing The format in which the poses are given should follow the format of Colmap (qw qx qy qz tx ty tz). I am not sure whether these are exactly the same poses used to compute the statistics in the PixLoc paper, but they should be comparable.

seungjuuuuuu commented 9 months ago

When I tried to test the performance of each image, I found that it had a large position error.

For example, seq2/frame00002 in active_search_1_1_markt_paris_10k has the following pose information: [qw qx qy qz tx ty tz]0.65112 0.619581 0.303496 -0.31631 -12.3897 5.00842 73.4588‘

The gt released by the original author has the following pose information: [X Y Z W P Q R] 65.474678 -35.436835 1.599990 0.650746 0.620279 0.302991 -0.316196

When I solve for the position error between [-12.3897, 5.00842, 73.4588] and [65.474678, -35.436835, 1.599990], I calculate 113.5369. Does this seem to be a problem?

tsattler commented 9 months ago

tx ty tz define the translation, not the position. X Y Z specify the position, not the translation. You cannot directly compare the numbers. You get the estimated position of Active Search as -R^T * t, where t = [tx, ty, tz]^T is the translation vector stored in the file and R is the rotation matrix defined by (qw, qx, qy, qz) (R^T is the transpose (or inverse) of that matrix).

seungjuuuuuu commented 9 months ago

tx ty tz define the translation, not the position. X Y Z specify the position, not the translation. You cannot directly compare the numbers. You get the estimated position of Active Search as -R^T * t, where t = [tx, ty, tz]^T is the translation vector stored in the file and R is the rotation matrix defined by (qw, qx, qy, qz) (R^T is the transpose (or inverse) of that matrix).

Thank you so much for your constant replies and help to me!