zju3dv / LoFTR

Code for "LoFTR: Detector-Free Local Feature Matching with Transformers", CVPR 2021, T-PAMI 2022
https://zju3dv.github.io/loftr/
Apache License 2.0
2.28k stars 360 forks source link

Robustness against rotation and flipping #213

Closed repoit closed 2 years ago

repoit commented 2 years ago

For my use case, images might be flipped or rotated. However, the model doesn't perform well on these images (see the example below). Do you believe the model would get robust against these transformations if I added augmentation for rotation and flipping during the training process?

ex2 Two images showing the same scene with high matching rates

ex1 When flipped, no good matches are detected

Varun-Tandon14 commented 2 years ago

@repoit you might want to try SE2-LoFTR (official repo). From my limited experience, SE2-LoFTR seemed to perform reasonably well for rotations of up to 30 degrees. Hopefully, this helps.

zehongs commented 2 years ago

Augmentation will be helpful. I believe SE2-LoFTER can also handle this problem to a certain extent. Another useful trick is to run LoFTR multiple times, e.g. [{image0, image1}, {rot90-image0, image1}, {rot180-image0, image1}, {rot270-image0, image1}, {flipped-images0, image1}] and take the union of all derived matches . While this can be batched, it can require a lot of GPU memory and a longer time😂

repoit commented 2 years ago

Augmentation will be helpful.

Great. I will then setup the training pipeline using MegaDepth and augment the existing training data with rotated/flipped cases. Is there anything important I should consider before doing so, or is it sufficient to just augment the data and follow your instructions for training?

Another useful trick is to run LoFTR multiple times, e.g. [{image0, image1}, {rot90-image0, image1}, {rot180-image0, image1}, {rot270-image0, image1}, {flipped-images0, image1}] and take the union of all derived matches . While this can be batched, it can require a lot of GPU memory and a longer time😂

I like this approach, but efficiency is key in our application.

SE2-LoFTER

The SE2 model does indeed work better (see example below). I believe it's interesting to compare the results using SE2 to LoFTR when trained with augmented data.

se2_example

Comparison of SE2 non-flipped (left) and flipped (right) when using the 8rot.ckpt model weights

zehongs commented 2 years ago

Training LoFTR requires lots of GPUs. And augmentation is likely to hurt the performance of matching "normal" image pairs.

ACSL-ricardo commented 1 year ago

@repoit Did you or anyone finally apply data augmentation? It seems it is necessary to change the spvs_coarse function in supervision.py to be able to change the ground truth depending on the rotation applied. Am I right?