Better performance on lower threshold imply better performance on higher threshold?

As per the website, the evaluations are being performed at thresholds (0.25m, 2°) / (0.5m, 5°) / (5m, 10°) on Aachen Day-Night dataset. It appears that smaller thresholds like (0.25m, 2°) are more strict and harder to reach than larger thresholds like (5m, 10°). Based on this, it might be reasonable to hope that if a method performs better at smaller thresholds then it would perform better at larger thresholds as well. However, from some of the results reported on the benchmark, it seems this may not be true. What is a possible reason why a method could perform better at a smaller threshold yet performs worse at the larger thresholds?

Some examples from https://www.visuallocalization.net/benchmark/ for Aachen Day-Night dataset on Night images

d2-net-ydb(77.6 / 84.7 / 93.9) is better at smaller thresholds but not at larger thresholds relative to HF_SG_4096_nv_50_sp(67.3 / 80.6 / 96.9)
attention 5K(71.4 / 83.7 / 91.8) vs. HF_SG_4096_nv_50_sp(67.3 / 80.6 / 96.9)
SuperPoint (baseline)(73.5 / 79.6 / 88.8) vs. rootsift_upright_8k_seedmatcher_sink0.2_256_34(68.4 / 82.7 / 96.9)

Thank you

tsattler / visuallocalizationbenchmark

Better performance on lower threshold imply better performance on higher threshold? #35