mihaidusmanu / local-feature-refinement

Multi-View Optimization of Local Feature Geometry
BSD 3-Clause "New" or "Revised" License
226 stars 24 forks source link

Questions about the usage of two-view refinement #1

Open Nyohohoho opened 4 years ago

Nyohohoho commented 4 years ago

Thank you very much for your amazing work and kind sharing.

I am sorry to bother you in your busy time, but could you teach me how to use your two-view refinement code? I currently have an algorithm that extracts matched keypoints from two images. I save them as:

keypoints1 (with shape (2000, 2)) and keypoints2 (with shape (2000, 2)) 2000 means the number of keypoints, and 2 means two coordinates (x and y)

They are already matched, which means (keypoints1[i][0], keypoints1[i][1]) is corresponding to (keypoints2[i][0], keypoints2[i][1]). In this case, how can I apply your two-view refinement? Since I am now focusing only on two-view scenario, I hope to try your powerful method.

I will really appreciate it if you can help me with this naive question. Thank you again for the great contributions to the community.

mihaidusmanu commented 4 years ago

Currently, there is no easy way to run the refinement on two-views only. I will try to explain it here; the process is similar to compute_match_graph.py.

You first have to call refine_matches_coarse_to_fine as follows:

displacements12 = refine_matches_coarse_to_fine(
    image1, keypoints1,
    image2, keypoints2,
    matches,
    net, device, batch_size, symmetric=False, grid=False
)

where keypoints1 is Ax2, keypoints2 is Bx2, and matches is Mx2 (first and second columns correspond to the feature index in image 1, image 2 respectively). This will return an Mx2 vector corresponding to the "correction" that needs to be applied to the keypoint of the second image for each match.

Please take care at the fact that the keypoints are expected in format x, y where x points right, y points down, while the returned flow is y, x.

To update the keypoints, you can do something along the lines:

dx = displacements12[:, 1]
dy = displacements12[:, 0]
keypoints2[matches2[:, 1], 0] += dx * 16
keypoints2[matches2[:, 1], 1] += dy * 16
# Only valid if feature extraction is ran on full-resolution.
# Otherwise, you also need to multiply by the downsampling factor between the
# original image 2 and image2 used in the call to refine_matches_coarse_to_fine.

In your case, you can set matches to identity, i.e., np.stack([np.arange(2000), np.arange(2000)]).T.

Let me know if you run into any issues!

I will try to prepare a quick script for the two-view case and add it to the repository!

lihanlun commented 2 years ago

Hi mihaidusmanu Thanks a lot for your amazing work and kind sharing. I am very sorry to bother you. I found that the output of refine_matches_coarse_to_fine always less than one pixels (test dataset Herzjesu, Fountain). If I add an extra pixels for the key points location, the out of the refine_matches_coarse_to_fine is still less than one pixels. Is that this function can only handle errors of less than one pixel. I would really appreciate it if you could answer my question. Thanks again for sharing your code.

mihaidusmanu commented 2 years ago

Hello. Inside our pipeline, we use 33x33 patches for refinement and the coordinates inside these patches are normalized such that the top left corner is (-1, -1) and bottom right is (1, 1). The outputs of refine_matches_coarse_to_fine is also normalized accordingly. If you want to get pixel displacements, you will need to multiply by 16 (to undo the normalization) and potentially also by the scaling factor used during feature extraction. I have edited my previous comment to address this.

You can refer to the following snippet for instance

https://github.com/mihaidusmanu/local-feature-refinement/blob/2e28c182f74328e2d7ae727de1ee4003cb4d921b/reconstruction-scripts/colmap_utils.py#L136-L137

Regarding keypoints moving more than one pixel, that is definitely possible, but it heavily depends on the initial features that you are trying to refine: for SIFT there might be very few keypoints that move by a large amount while for learned features the number will be higher..

lihanlun commented 2 years ago

Oh, I understand. Thank you very very much for your help.