tsattler / visuallocalizationbenchmark

338 stars 58 forks source link

Benchmark at different thresholds #34

Closed lokhande-vishnu closed 3 years ago

lokhande-vishnu commented 3 years ago

The benchmark reports results at thresholds (0.25m, 2°) / (0.5m, 5°) / (5m, 10°) however papers like R2D2 report at (0.5m, 2°) / (1m, 5°) / (5m, 10°). How do I obtain results at the same thresholds as in these paper? Reference -- Table 4 of https://arxiv.org/pdf/1906.06195.pdf

Also, there seems to be a mismatch between the results reported in the paper and the ones obtained from the benchmark. Any advice on the reason for the difference. Upon extracting the features from the R2D2 model [code- https://github.com/naver/r2d2/blob/master/extract.py], I run

python reconstruction_pipeline.py
                --dataset_path /local/aachen
                --colmap_path /local/colmap/build/src/exe
                --method_name r2d2

I, then, upload the resulting Aacheneval[r2d2].txt file to https://www.visuallocalization.net/submission/ I obtain -- [DAY] 0.0 / 0.0 / 0.0, [NIGHT] 67.3 / 81.6 / 93.9, while the paper reports 45.9 / 65.3 / 86.7

tsattler commented 3 years ago

In June 2020, we updated the ground truth poses of the Aachen nighttime images, as announced on the webpage:

2020-06-18: The Aachen Day-Night v 1.1 dataset has been added. This is an extension of the Aachen Day-Night dataset that contains additional night-time queries. Please refer to the corresponding arXiv paper for more information. Additionally, the reference poses for the night-time query images in the original Aachen Day-Night dataset have been updated to more accurate poses. The results in the Aachen Day-Night table may therefore have changed. Since we consider the new night-time poses to be more accurate, we have also changed the error thresholds to be the same as the ones for daytime.

As part of this update, the poses of the nighttime images used for evaluation were updated, resulting in better numbers on the benchmark. As the poses are more accurate, we decided to also use tighter thresholds. The results on the website have been updated accordingly and I would suggest to compare against them.

I hope that answers your questions.

The arxiv paper mentioned above is this one: https://arxiv.org/pdf/2005.05179.pdf