Taxonomy of sequential descriptor methods.
This is the official repository for the paper "Learning Sequential Descriptors for Sequence-based Visual Place Recognition". It can be used to reproduce results from the paper and experiment with a wide range of sequential descriptor methods for Visual Place Recognition.
Create your local environment and then install the required packages using:
pip install -r pip_requirements.txt
# to install the official TimeSformer package
git clone https://github.com/facebookresearch/TimeSformer
cd TimeSformer
python setup.py build develop
The experiments in the paper use two main datasets Mapillary Street Level Sequence (MSLS) and Oxford RobotCar.
We are currently exploring hosting options, so this is a partial list of models. More models will be added soon!! If you need any particular model feel free to open an issue and we will provide it
Model | Training on MSLS, seq len 5 | ||
---|---|---|---|
MSLS (R@1) | Download | ||
CCT384 + SeqVLAD | 89.6 | [Link] |
Once the datasets are ready, we can proceed running the experiments with the architecture of choice.
NB: to build MSLS sequences, some heavy pre-processing to build data structures is needed. The dataset class will automatically cache this,
so to compute them only the first time. Therefore the first experiment that you ever launch will take 2-3 hours to build this structures which will
be saved in a cache
directory, and following experiments will then start quickly. Note that this procedure caches everything with relative paths,
therefore if you want to run experiments on multiple machines you can simply copy the cache
directory.
Finally, note that this data structures must be computed for each sequence length, so potentially in cache
you will have a file for each sequence_length
that you want to experiment with.
TODO one for each family of methods
Example with CCT-384 + SeqVLAD on MSLS:
python main_scripts/main_train.py \
--dataset_path <MSLS path>
--img_shape 384 384 \
--arch cct384 --aggregation seqvlad \
--trunc_te 8 --freeze_te 1 \
--train_batch_size 4 --nNeg 5 --seq_length 5 \
--optim adam --lr 0.00001
Example with TimeSformer:
python main_scripts/main_train.py \
--dataset_path <MSLS path>
--img_shape 224 224 \
--arch timesformer --aggregation _ \
--train_batch_size 4 --nNeg 5 --seq_length 5 \
--optim adam --lr 0.00001
Example with ResNet-18 + GeM + CAT :
python main_scripts/main_train.py \
--dataset_path <MSLS path>
--img_shape 480 640 \
--arch r18l3 --pooling gem --aggregation cat \
--train_batch_size 4 --nNeg 5 --seq_length 5 \
--optim adam --lr 0.00001
For experiments on Robotcar, we did not change any hyperparameters wrt experiments on MSLS. Thus you can simply select the configuration
of backbone-pooling-aggregation that you want, like in the examples above, and then replace:
--dataset MSLS path with --dataset <Robotcar path>
Follow the instructions above to download the dataset
To add the PCA to SeqVLAD or CAT models use:
python main_scripts/evaluation.py \
--pca_outdim <descr. dim.> \
--resume <path trained model w/o PCA>
where the parameter --pca_outdim
determines the final descriptor dimensionality (in our test we used 4096)
It is possible to evaluate the trained models using:
python main_scripts/evaluation.py \
--resume <path trained model>
Deep Visual Geo-Localization Benchmark
Official SeqNet implementation
Official SeqMatchNet implementation
Here is the bibtex to cite our paper
@article{Mereu_2022_seqvlad,
author={Mereu, Riccardo and Trivigno, Gabriele and Berton, Gabriele and Masone, Carlo and Caputo, Barbara},
journal={IEEE Robotics and Automation Letters},
title={Learning Sequential Descriptors for Sequence-Based Visual Place Recognition},
year={2022},
volume={7},
number={4},
pages={10383-10390},
doi={10.1109/LRA.2022.3194310}
}