[EN] Faster & better (learned) feature extraction (SuperPoint / SuperGlue etc.) & matching (LoFTR etc.)

Feature request / Suggestion

Currently, we use the executables from AliceVision Meshroom (and COLMAP for Gaussian Splatting) to perform all the photogrammetry steps, including feature extraction and feature matching. While we have taken steps (batching and parallelization) to ensure high throughput and performance, these steps are still limited in several ways – primary ones include slower overall processing as compared to COLMAP and the feature extraction being reliant on traditional computer vision heuristics that could be improved by learning-based approaches, such as SuperPoint / SuperGlue / LightGlue (see: glue-factory), combine with LoFTR / DeepMatcher (also see: SuperCOLMAP). This project would involve a survey of the state-of-the-art for the most suitable methods, followed by integration of the same in the Tirtha pipeline (with tests).

Possible implementation

Survey the state-of-the-art (check the next section for some starting points) and gather results using some methods on a set of small image sets from Tirtha's database.
Check if the most well-performing models are a good fit for Tirtha (FOSS + integration-wise).
Implement the best model(s) in Tirtha's pipeline.

Resources

SuperCOLMAP - SuperPoint in COLMAP
Deep Visual SLAM Frontends: SuperPoint, SuperGlue, and SuperMaps (#CVPR2020 Invited Talk) - YouTube
Image Matching Webui - a Hugging Face Space by Realcat -- To compare several current image matching efforts.
Some useful discussion here - https://github.com/openMVG/openMVG/issues/2224.
A recent (Jan 2024) method: DeDoDe

Self-check

[x] I have checked smlab-niser/tirtha-public and could not find any related open or closed feature request.

smlab-niser / tirtha-public