This is the official implementation of the Transpotter paper. The code has been tested with Python version 3.6.8. Pre-trained checkpoints are also released.
pip install -r requirements.txt
cd checkpoints/
sh download_models.sh
Please follow the steps in this repository to extract the features for the LRS2, LRS3 test set. Please use the model trained on LRS2 + LRS3 for the feature extraction. The provided code and pre-trained models work with these features.
The following command is used to compute the scores mentioned in the last row of Table 1 of the paper
# LRS3
python test_and_score.py --data_root /path/to/lrs3/test/ --test_pkl_file checkpoints/lrs3_test.pkl --ckpt_path checkpoints/ft_lrs3.pth --localization
# LRS2
python test_and_score.py --data_root /path/to/lrs2/vid/ --test_pkl_file checkpoints/lrs2_test.pkl --ckpt_path checkpoints/ft_lrs2.pth --localization
--localization
flag is only used to compute $mAP^{loc}$. The other metrics can be computed by not using this flag. Please cite the following paper if you find our work useful:
@inproceedings{prajwal2021visual,
title={Visual Keyword Spotting with Attention},
author={Prajwal, KR and Momeni, Liliane and Afouras, Triantafyllos and Zisserman, Andrew},
booktitle={BMVC},
year={2021}
}
We thank the author of The Annotated Transformer for the Transformer implementation.