awesome-local-global-descriptor
This is my personal note about local and global descriptor. Trying to make anyone can get in to these fields more easily.
If you find anything you want to add, feel free to post on issue or email me.
This repo is also a side product when I was doing the survey of our paper UR2KID. If you find this repo useful, please also consider to cite our paper.
@article{yang2020ur2kid,
title={UR2KiD: Unifying Retrieval, Keypoint Detection, and Keypoint Description without Local Correspondence Supervision},
author={Yang*, Tsun-Yi; Nguyen*, Duy-Kien; Heijnen, Huub; Balntas, Vassileios},
journal={arXiv preprint arXiv:2001.07252},
year={2020}
}
This repo will be constantly updated.
Author: Tsun-Yi Yang (shamangary@hotmail.com)
Online talks
Year |
Topic |
Link |
[ECCV20] |
MLAD Workshop |
morning, afternoon |
[3DV20] |
3DGV Talk: Marc Pollefeys - 3D geometric vision |
youtube |
[CVPR20] |
Image Matching Workshop |
youtube |
[CVPR20] |
CVPR2020 tutorial: Local Features: From SIFT to Differentiable Methods |
youtube |
[CVPR20] |
Deep Visual SLAM Frontends: SuperPoint, SuperGlue, and SuperMaps |
youtube |
Local matching pipeline
In this section, I focus on the review about the sparse keypoint matching and it's pipeline.
1. Keypoint detection
This subsection includes the review about keypoint detection and it's orientation, scale, or affine transformation estimation.
Year |
Paper |
Link |
Code |
[CVPR20] |
Holistically-Attracted Wireframe Parsing |
arXiv |
github |
[CVPR20] |
KeyPose: Multi-View 3D Labeling and Keypoint Estimation for Transparent Objects |
arXiv |
link |
[3DV19] |
SIPs: Succinct Interest Points from Unsupervised Inlierness Probability Learning |
arXiv |
Github |
[ICCV19] |
Key.Net: Keypoint Detection by Handcrafted and Learned CNN Filters |
PDF |
Github |
[ECCV18] |
Repeatability Is Not Enough: Learning Discriminative Affine Regions via Discriminability |
arXiv |
Github |
[CVPR17] |
Learning Discriminative and Transformation Covariant Local Feature Detectors |
PDF |
Github |
[CVPR17] |
Quad-networks: unsupervised learning to rank for interest point detection |
PDF |
- |
[CVPR16] |
Learning to Assign Orientations to Feature Poitns |
- |
Github |
[CVPR15] |
TILDE: a Temporally Invariant Learned DEtector |
arXiv |
Github |
Year |
Paper |
link |
Code |
[ECCV20] |
DH3D: Deep Hierarchical 3D Descriptors for Robust Large-Scale 6DoF Relocalization |
link |
github |
[ICCV19] |
USIP: Unsupervised Stable Interest Point Detection from 3D Point Clouds |
arXiv |
Github |
[arXiv19] |
Self-Supervised 3D Keypoint Learning for Ego-motion Estimation |
arXiv |
Github |
2. Keypoint description (local descriptor)
In the last few decades, people focus on the patch descriptor
Year |
Paper |
link |
Code |
[CVPR16] |
Accumulated Stability Voting: A Robust Descriptor from Descriptors of Multiple Scales |
PDF |
Github |
[CVPR15] |
Domain-Size Pooling in Local Descriptors: DSP-SIFT |
PDF |
- |
[CVPR15] |
BOLD - Binary Online Learned Descriptor For Efficient Image Matching |
PDF |
Github |
[CVPR13] |
Boosting binary keypoint descriptors |
- |
- |
[CVPR12] |
Freak: Fast retina keypoint |
- |
- |
[CVPR12] |
Three things everyone should know to improve object retrieval |
PDF |
- |
[IPOL11] |
ASIFT: An Algorithm for Fully Affine Invariant Comparison |
- |
- |
[ICCV11] |
BRISK: Binary robust invariant scalable keypoints |
- |
- |
[ICCV11] |
Orb: An efficient alternative to sift or surf |
- |
- |
[ICCV11] |
Local inten-sity order pattern for feature description |
- |
- |
[CVIU06] |
Speeded-up robust features (SURF) |
- |
- |
[ECCV06] |
Surf:Speeded up robust features |
- |
- |
[IJCV04] |
Distinctive image features from scale-invariant keypoints |
- |
Github |
Year |
Paper |
link |
Code |
[TIP19] |
Learning Local Descriptors by Optimizing the Keypoint-Correspondence Criterion: Applications to Face Matching, Learning from Unlabeled Videos and 3D-Shape Retrieval |
arXiv |
Github |
[ICCV19] |
Beyond Cartesian Representations for Local Descriptors |
PDF |
- |
[CVPR19] |
SOSNet: Second Order Similarity Regularization for Local Descriptor Learning |
arXiv,Page |
Github |
[ECCV18] |
GeoDesc: Learning Local Descriptors by Integrating Geometry Constraints |
- |
Github |
[CVPR18] |
Local Descriptors Optimized for Average Precision |
Page |
- |
[NIPS17] |
Working hard to know your neighbor's margins: Local descriptor learning loss |
arXiv |
Github |
[ICCV17] |
DeepCD: Learning Deep Complementary Descriptors for Patch Representations |
PDF |
Github |
[CVPR17] |
L2-Net: Deep Learning of Discriminative Patch Descriptor in Euclidean Space |
PDF |
Github |
[arXiv16] |
PN-Net: Conjoined Triple Deep Network for Learning Local Image Descriptors |
arXiv |
Github |
[BMVC16] |
Learning local feature descriptors with triplets and shallow convolutional neural networks |
PDF |
Github |
[ICCV15] |
Discriminative Learning of Deep Convolutional Feature Point Descriptors |
Page |
Github |
[CVPR15] |
MatchNet: Unifying Feature and Metric Learning for Patch-Based Matching |
PDF |
- |
[CVPR15] |
Learning to compare image patches via convolutional neural networks |
PDF |
Github |
Year |
Paper |
link |
Code |
[arXiv19] |
DEEPPOINT3D: LEARNING DISCRIMINATIVE LOCAL DESCRIPTORS USING DEEP METRIC LEARNING ON 3D POINT CLOUDS |
arXiv |
- |
3. End-to-end matching pipeline
Recently, more and more papers try to embed the whole matching pipeline (keypoint detection, keypoint description) into one framework.
Year |
Paper |
link |
Code |
[arXiv20] |
Dense Semantic 3D Map Based Long-Term Visual Localization with Hybrid Features |
arXiv |
- |
[arXiv20] |
D2D: Learning to find good correspondences for image matching and manipulation |
arXiv |
- |
[arXiv20] |
DISK: Learning local features with policy gradient |
arXiv |
- |
[arXiv20] |
D2D: Keypoint Extraction with Describe to Detect Approach |
arXiv |
- |
[arXiv20] |
HDD-Net: Hybrid Detector Descriptor with Mutual Interactive Learning |
arXiv |
- |
[arXiv20] |
Learning Feature Descriptors using Camera Pose Supervision |
arXiv |
- |
[arXiv20] |
Efficient Neighbourhood Consensus Networks via Submanifold Sparse Convolutions |
arXiv |
github |
[arXiv20] |
S2DNet: Learning Accurate Correspondences for Sparse-to-Dense Feature Matching |
arXiv |
- |
[CVPR20] |
ASLFeat: Learning Local Features of Accurate Shape and Localization |
arXiv |
github,tfmatch |
[CVPR20] |
Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level Task |
arXiv |
- |
[WACV19] |
DGC-Net: Dense Geometric Correspondence Network |
arXiv |
github |
[NIPS19] |
R2D2: Repeatable and Reliable Detector and Descriptor |
arXiv,Page |
Github |
[ICCV19] |
ELF: Embedded Localisation of Features in Pre-Trained CNN |
PDF |
Github |
[CVPR19] |
RF-Net: An End-to-End Image Matching Network based on Receptive Field |
arXiv |
Github |
[CVPR19] |
D2-Net: A Trainable CNN for Joint Description and Detection of Local Features |
arXiv,Page |
Github |
[BMVC19] |
Matching Features without Descriptors: Implicitly Matched Interest Points |
PDF |
github |
[CVPRW18] |
SuperPoint: Self-Supervised Interest Point Detection and Description |
arXiv |
Github,3rd_party |
[NIPS18] |
LF-Net: Learning Local Features from Images |
PDF |
Github |
[ECCV16] |
LIFT: Learned Invariant Feature Points |
- |
Github |
Year |
Paper |
link |
Code |
[CVPR20] |
D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features |
arXiv |
github |
[arXiv20] |
StickyPillars: Robust feature matching on point clouds using Graph Neural Networks |
arXiv |
- |
3.5. Dense descriptor
Unlike local keypoint descriptor depends on keypoint, some works try to get the whole dense descriptor representation.
Year |
Paper |
link |
Code |
[ICRA20] |
GN-Net: The Gauss-Newton Loss for Multi-Weather Relocalization |
arXiv, MyNote |
Web |
[ICCV17] |
CLKN: Cascaded Lucas-Kanade Networks for Image Alignment |
PDF |
- |
4. Geometric verification or learning based matcher
After the matching, standard RANSAC and it's variants are usually adopted for outlier removal.
Year |
Paper |
link |
Code |
[ECCV20] |
Making Affine Correspondences Work in Camera Geometry Computation |
arXiv |
github |
[arXiv20] |
AdaLAM: Revisiting Handcrafted Outlier Detection |
arXiv |
github |
[arXiv20] |
Multi-View Optimization of Local Feature Geometry |
arXiv |
- |
[CVPR19] |
MAGSAC: Marginalizing Sample Consensus |
PDF |
Github |
[CVPR16] |
Progressive Feature Matching with Alternate Descriptor Selection and Correspondence Enrichment |
PDF |
- |
[CVPR13] |
Robust Feature Matching with Alternate Hough and Inverted Hough Transforms |
PDF |
- |
[ECCV12] |
Improving Image-Based Localization by Active Correspondence Search |
PDF |
- |
[CVPR05] |
Matching with PROSAC – Progressive Sample Consensus |
PDF |
- |
[CVPR05] |
Two-View Geometry Estimation Unaffected by a Dominant Plane |
PDF |
Github |
Year |
Paper |
link |
Code |
[ECCV20] |
Online Invariance Selection for Local Feature Descriptors |
arXiv |
github |
[CVPR20] |
SuperGlue: Learning Feature Matching with Graph Neural Networks |
arXiv |
Github |
[CVPR20] |
High-dimensional Convolutional Networks for Geometric Pattern Recognition |
arXiv, youtube |
- |
[CVPR20] |
ACNe: Attentive Context Normalization for Robust Permutation-Equivariant Learning |
arXiv |
github |
[arXiv20] |
RANSAC-Flow: generic two-stage image alignment |
arXiv, youtube |
page,Github |
[ICCV19] |
NG-RANSAC for Epipolar Geometry from Sparse Correspondences |
arXiv |
Github |
[ICCV19] |
Learning Two-View Correspondences and Geometry Using Order-Aware Network |
arXiv |
Github |
[CVPR18] |
Learning to Find Good Correspondences |
- |
Github |
Year |
Paper |
link |
Code |
[arXiv20] |
Deep Global Registration |
arXiv, youtube |
- |
[Access18] |
Multi-Temporal Remote Sensing Image Registration Using Deep Convolutional Features |
PDF |
Github |
Global retrieval
Consider global retrieval usually targets on a lot of candidates, there are several way to generate one single description for one image.
1. Feature aggregation
When there is only hand-crafted local descriptors, people usually uses feature aggregation from a set of local descriptors and output a single description.
Year |
Paper |
link |
Code |
[ICCV13] [IJCV15] |
To aggregate or not to aggregate: Selective match kernels for image search Image search with selective match kernels: aggregation across single and multiple images |
ICCV IJCV |
Official : matlab, from DELF (tensorflow) |
[CVPR13] |
All about VLAD |
PDF |
- |
[ECCV10] |
Improving the fisher kernel for large-scale image classification |
PDF |
- |
[CVPR07] |
Object retrieval with large vocabularies and fast spatial matching |
PDF |
- |
[CVPR06] |
Fisher kenrels on visual vocabularies for image categorizaton |
PDF |
- |
Similar idea but use deep learning to adapt classical algorithm
Year |
Paper |
link |
Code |
[ECCV16] |
CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples. |
PDF |
- |
[CVPR16] |
NetVLAD: CNN architecture for weakly supervised place recognition |
Page |
Github |
2. Real-valued descriptor
One single representation from the image.
Year |
Paper |
link |
Code |
[ECCV20] |
Learning and aggregating deep local descriptors for instance-level recognition |
arXiv |
github |
[ECCV20] |
Predicting Visual Overlap of Images Through Interpretable Non-Metric Box Embeddings |
arXiv |
github |
[ECCV20] |
Smooth-AP: Smoothing the Path Towards Large-Scale Image Retrieval |
arXiv |
github |
[ECCV20] |
SOLAR: Second-Order Loss and Attention for Image Retrieval |
arXiv |
- |
[ECCV20] |
Unifying Deep Local and Global Features for Efficient Image Search |
arXiv |
- |
[arXiv19] |
ACTNET: end-to-end learning of feature activations and multi-stream aggregation for effective instance image retrieval |
arXiv |
- |
[TIP19] |
REMAP: Multi-layer entropy-guided pooling of dense CNN features for image retrieval |
arXiv |
- |
[ICCV19] |
Learning with Average Precision: Training Image Retrieval with a Listwise Loss |
arXiv |
Github |
[CVPR19] |
Detect-to-Retrieve: Efficient Regional Aggregation for Image Search |
PDF |
Github |
[TPAMI18] |
Fine-tuning CNN Image Retrieval with No Human Annotation |
arXiv |
Github |
[IJCV17] |
End-to-end Learning of Deep Visual Representations for Image Retrieval |
arXiv |
Github |
[ICCV17] |
Large-Scale Image Retrieval with Attentive Deep Local Features |
- |
Github |
[ECCV16] |
CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples |
arXiv |
Github |
3. Binary descriptor and quantization
For more compact representation, a binary descriptor can be generated from hashing or thresholding. Quantization is also very popular in large-scale image retrieval.
Year |
Paper |
link |
Code |
[ICCVW19] |
DAME WEB: DynAmic MEan with Whitening Ensemble Binarization for Landmark Retrieval without Human Annotation |
PDF |
Github |
[CVPR19] |
FastAP: Deep Metric Learning to Rank |
PDF |
Github |
[CVPR18] |
Hashing as Tie-Aware Learning to Rank |
PDF |
Github |
[AAAI18] |
Deep Region Hashing for Generic Instance Search from Image |
- |
- |
[TPAMI18] |
Supervised Learning of Semantics-Preserving Hash via Deep Convolutional NeuralNetworks |
- |
- |
[TPAMI13] |
Iterative Quantization: A Procrustean Approach to Learning Binary Codes for Large-Scale Image Retrieval |
PDF |
- |
[TPAMI10] |
Product quantization for nearest neighbor search |
PDF |
- |
4. Pre-processing/Post-processing
Anything can boost the performance in the pre/post-processing stage such as rectification/re-ranking/query expansion.
Year |
Paper |
link |
Code |
[arXiv20] |
Image Stylization for Robust Features |
arXiv |
- |
[ECCV20] |
Single-Image Depth Prediction Makes Feature Matching Easier |
arXiv |
github |
[CVPR19] |
Local features and visual words emerge in activations |
PDF |
- |
[CVPR12] |
Object retrieval and localization with spatially-constrained similarity measure and k-NN re-ranking |
PDF |
- |
5. 3d point cloud
Year |
Paper |
link |
Code |
[CVPR18] |
PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition |
arXiv |
Github |
Multi-tasking local and global descriptors
Some works try to cover both local descriptor and global retrieval due to the shared similarity about the activation and the applications.
Year |
Paper |
link |
Code |
[arXiv20] |
UR2KiD: Unifying Retrieval, Keypoint Detection, and Keypoint Description without Local Correspondence Supervision |
arXiv |
- |
[CVPR19] |
ContextDesc: Local Descriptor Augmentation with Cross-Modality Context |
- |
Github |
[CVPR19] |
From Coarse to Fine: Robust Hierarchical Localization at Large Scale with HF-Net |
arXiv |
Github |
[ICCV17] |
Large-Scale Image Retrieval with Attentive Deep Local Features (DELF) |
- |
Github |
Reivew type paper
Year |
Paper |
link |
Code |
[arXiv18] |
From handcrafted to deep local features |
arXiv |
- |
[CVPR17] |
Comparative Evaluation of Hand-Crafted and Learned Local Features |
PDF |
- |
Metric learning
Year |
Paper |
link |
Code |
[arXiv20] |
Metric learning: cross-entropy vs. pairwise losses |
arXiv |
- |
[arXiv19] |
A Metric Learning Reality Check |
arXiv |
- |
SfM
Year |
Paper |
link |
Code |
[arXiv29] |
Reducing Drift in Structure from Motion using Extended Features |
arXiv |
- |
MVS
Year |
Paper |
link |
Code |
[CVPR20] |
Fast-MVSNet: Sparse-to-Dense Multi-View Stereo With Learned Propagation and Gauss-Newton Refinement |
arXiv |
github |
[CVPR20] |
BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Networks |
arXiv |
github |
View Synthesis/Novel view/Image completion
Year |
Paper |
link |
Code |
[ECCV20] |
Flow-edge Guided Video Completion |
arXiv |
link |
[arXiv20] |
Reference Pose Generation for Visual Localization via Learned Features and View Synthesis |
arXiv |
- |
[CVPR20] |
BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Networks |
arXiv |
github |
Segmentation localization
Year |
Paper |
link |
Code |
[ICCV19] |
Fine-Grained Segmentation Networks: Self-Supervised Segmentation for Improved Long-Term Visual Localization |
arXiv |
github |
Benchmarks
Local matching
Year |
Paper |
link |
Code |
Note |
[arXiv2020] |
Image Matching across Wide Baselines: From Paper to Practice |
arXiv |
github |
[CVPR17] |
HPatches: A benchmark and evaluation of handcrafted and learned local descriptors |
arXiv |
Github |
Hpatches |
[TPAMI11] |
Discriminative learning of local image descriptors |
Page |
- |
UBC/Brown dataset (subsets:Liberty (New York), Notre Dame (Paris) and Half Dome (Yosemite)) |
[CVPR08] |
On Benchmarking Camera Calibration and MultiView Stereo for High Resolution Imagery |
Global retrieval
Year |
Paper |
link |
Code |
Note |
[CVPR18] |
Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking |
Page |
Github |
ROxford5k, RParis6k |
[CVPR07] |
Object retrieval with large vocabularies and fast spatial matching |
Page |
- |
Oxford5k |
[CVPR08] |
Lost in Quantization: Improving Particular Object Retrieval in Large Scale Image Databases |
Page |
- |
Paris6k |
Localization (both local matching and global retrieval)
Year |
Paper |
link |
Code |
Note |
[ECCV20] |
Map-based Localization for Autonomous Driving |
web |
github1, github2 |
- |
[CVPR18] |
Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions |
PDF,Page |
Github |
Aachen-day-night, Robotcar, CMU-seasons |
Toolbox
Year |
Paper |
link |
[2020] |
Kapture |
github |
[2020] |
hloc - the hierarchical localization toolbox |
github |
[2020] |
pyslamv2 |
github |