zju3dv / LoFTR

Code for "LoFTR: Detector-Free Local Feature Matching with Transformers", CVPR 2021, T-PAMI 2022
https://zju3dv.github.io/loftr/
Apache License 2.0
2.31k stars 361 forks source link

scale issue #226

Open jeannotes opened 2 years ago

jeannotes commented 2 years ago

hi, thanks for the great work. and I have several questions: 1. suppose my input img is (384,384), batch-size 20, window-size 5 when do fine matching with fine features(192, 192), you use torch's unfold with stride and padding to get feat_f0_unfold(20,3200,2304) the in FineMatching you get heatmap and calculate nornalized_coords, I can understand this, but when calculate "expec_f_gt", you use "w_pt0_i" "pt1_i", and it is calculated like this: "expec_f_gt = (w_pt0_i[b_ids, i_ids] - pt1_i[b_ids, j_ids]) / scale / radius" why devided by scale and radius? I debug inside scale is 2, radius is 2 also the "expec_f_gt " seems not normalized, hope that you can explain.

  1. unlike dense matching ideas, this paper fisrt do coarse match and around those coarse do fine search, seems to be "2-step "algorithm, am I right?

  2. In coarsematching, you samples gt correspondence, does this affect accuracy? any experiments?

zehongs commented 2 years ago

Hi,

  1. The _i means in the image resolution. The 1/scale means feature resolution. The radius is used to unfold on that feature resolution. So, the scale * radius gives half the window-size on the image resolution.
  2. Technically, yes. We call it coarse-to-fine.
  3. Are you referring to not padding gt coarse-maches when trainning the fine-level matcher? We have not experimented on this.