Closed AliYoussef97 closed 7 months ago
Sorry for the late reply. This is a great question! The "+2" is due to our two-stage refinement process. In the first stage, we crop non-overlapping 8x8 fine windows from both images to compute the argmax confidence. In the second stage, we then crop 3x3 local windows around the argmax confidence areas in the right image. If the argmax from the first stage is located at the edge of an 8x8 window, the second stage needs to crop 3x3 local windows from 10x10 fine windows in the right image. Since the 8x8 fine windows do not overlap, we cannot adjust by padding zero in the 3x3 local windows or discarding edge argmaxes. Therefore, in the fine preprocess, we unfold 10x10 fine windows in the right image in advance for use in the second stage.
@wyf2020 Thank you so much this makes so much sense!
Just for clarification, $8 \times 8$ window softmax_matrix_f
- > get_fine_ds_match
to get the the keypoints from $8 \times 8$ window. Following that, keypoints -> get $3 \times 3$ local window from conf_matrix_ff
centered around the pixel-level keypoints. I hope my understanding is correct?
Yes, exactly! Your understanding is correct.
@wyf2020 Thank you so much!
Hello!
I had a brief question regarding image_1's fine features' dimention, in particular the addtion of +2 when unfolding the local windows here. I fail to understand the reasoning behind the +2, as along the pipeline
conf_matrix_ff
has a size of[M,W**2, (W+2)**2]
here. Althoughsoftmax_matrix_f
does become[M,WW,WW]
, conf_matrix_ff is stored as[M,W**2, (W+2)**2]
.Would really appreciate if you could provide an explanation for the +2.
Thank you!