tsattler / visuallocalizationbenchmark

342 stars 58 forks source link

Better performance on lower threshold imply better performance on higher threshold? #35

Closed lokhande-vishnu closed 4 years ago

lokhande-vishnu commented 4 years ago

As per the website, the evaluations are being performed at thresholds (0.25m, 2°) / (0.5m, 5°) / (5m, 10°) on Aachen Day-Night dataset. It appears that smaller thresholds like (0.25m, 2°) are more strict and harder to reach than larger thresholds like (5m, 10°). Based on this, it might be reasonable to hope that if a method performs better at smaller thresholds then it would perform better at larger thresholds as well. However, from some of the results reported on the benchmark, it seems this may not be true. What is a possible reason why a method could perform better at a smaller threshold yet performs worse at the larger thresholds?

Some examples from https://www.visuallocalization.net/benchmark/ for Aachen Day-Night dataset on Night images

  1. d2-net-ydb(77.6 / 84.7 / 93.9) is better at smaller thresholds but not at larger thresholds relative to HF_SG_4096_nv_50_sp(67.3 / 80.6 / 96.9)
  2. attention 5K(71.4 / 83.7 / 91.8) vs. HF_SG_4096_nv_50_sp(67.3 / 80.6 / 96.9)
  3. SuperPoint (baseline)(73.5 / 79.6 / 88.8) vs. rootsift_upright_8k_seedmatcher_sink0.2_256_34(68.4 / 82.7 / 96.9)

Thank you

tsattler commented 4 years ago

Why should better performance on the stricter necessarily imply better performance on the larger thresholds? Images that are not localized within the stricter thresholds are not typically harder to localize and a method might be able to find enough matches for precise pose estimation for the easier, but not for the harder test images. This method would perform well on the strict threshold, but not on the larger ones.