This repository contains the PyTorch implementation of the paper DnS: Distill-and-Select for Efficient and Accurate Video Indexing and Retrieval. It provides code for the knowledge distillation training of coarse- and fine-grained student networks based on similarities calculated from a teacher and the selector network. Also, the scripts for the training of the selector network are included. Finally, to facilitate the reproduction of the paper's results, the evaluation code, the extracted features for the employed video datasets, and pre-trained networks for the various students and selectors are available.
git clone https://github.com/mever-team/distill-and-select
cd distill-and-select
pip install -r requirements.txt
or
conda install --file requirements.txt
We provide our extracted features for all datasets to facilitate reproducibility for future research.
Download the feature files of the dataset you want:
All feature files are in HDF5 format
We provide the code for training and evaluation of our student models.
To train a fine-grained student, run the train_student.py
given fine-grained
as value to the --student_type
argument, as in the following command:
python train_student.py --student_type fine-grained --experiment_path experiments/DnS_students --trainset_hdf5 /path/to/dns_100k.hdf5
You can train an attention or binarization fine-grained students by setting either the --attention
or --binarization
flags to true
, respectively.
For fine-grained attention students:
python train_student.py --student_type fine-grained --binarization false --attention true --experiment_path /path/to/experiment/ --trainset_hdf5 /path/to/dns_100k.hdf5
For fine-grained binarization students:
python train_student.py --student_type fine-grained --binarization true --attention false --experiment_path /path/to/experiment/ --trainset_hdf5 /path/to/dns_100k.hdf5
To train a coarse-grained students, provide coarse-grained
to the --student_type
argument:
python train_student.py --student_type coarse-grained --experiment_path /path/to/experiment/ --trainset_hdf5 /path/to/dns_100k.hdf5 --attention true --learning_rate 1e-5
Provide one of the teacher
, fg_att_student_iter1
, fg_att_student_iter2
to the --teacher
argument in odrder to train a student with a different teacher:
python train_student.py --teacher fg_att_student_iter2 --experiment_path /path/to/experiment/ --trainset_hdf5 /path/to/dns_100k.hdf5
You can optionally perform validation with FIVR-5K by providing its HDF5 file to the --val_hdf5
and choosing one of the DSVR, CSVR, ISVR sets
with the --val_set
argument:
python train_student.py --student_type coarse-grained --val_hdf5 /path/to/fivr_5k.hdf5 --val_set ISVR --experiment_path /path/to/experiment/ --trainset_hdf5 /path/to/dns_100k.hdf5 --learning_rate 1e-5
Choose one of the FIVR-5K
, FIVR-200K
, CC_WEB_VIDEO
, SVD
, or EVVE
datasets to evaluate your models.
For the evaluation of the students, run the evaluation_student.py
script by providing the path to the .pth
model to the --student_path
argument, as in the following command:
python evaluation_student.py --student_path experiments/DnS_students/model_fg_att_student.pth --dataset FIVR-5K --dataset_hdf5 /path/to/fivr_200k.hdf5
If you don't pass any value to the --student_path
, a pretrained model will be selected:
python evaluation_student.py --student_type fine-grained --attention true --dataset FIVR-5K --dataset_hdf5 /path/to/fivr_200k.hdf5
We also provide the code for training of the selector network and the evaluation of our overall DnS framework.
To train a selector network, run the train_selector.py
as in the following command:
python train_selector.py --experiment_path experiments/DnS_students --trainset_hdf5 /path/to/dns_100k.hdf5
Provide different values to --threshold
argument to train the selector network with different label functions.
For the evaluation of the DnS framework, run the evaluation_dns.py
script by providing the path to the .pth
model to the corresponding network arguments, as in the following command:
python evaluation_dns.py --selector_network_path experiments/DnS_students/model_selector_network.pth --dataset FIVR-5K --dataset_hdf5 /path/to/fivr_200k.hdf5
If you don't pass any value to the network path argument, then the pretrained model will be selected. E.g. to evalute DnS with the Fine-grained Attention Student:
python evaluation_dns.py --attention true --dataset FIVR-5K --dataset_hdf5 /path/to/fivr_200k.hdf5
Provide different values to --percentage
argument to sent different number of video pairs for reranking to the Fine-grained student.
Given the value all
, it runs evaluation for all dataset percentages.
We also provide our pretrained models trained with the fg_att_student_iter2
teacher.
from model.feature_extractor import FeatureExtractor
from model.students import FineGrainedStudent, CoarseGrainedStudent
from model.selector import SelectorNetwork
feature_extractor = FeatureExtractor(dims=512).eval()
fg_att_student = FineGrainedStudent(pretrained=True, attention=True).eval() fg_bin_student = FineGrainedStudent(pretrained=True, binarization=True).eval()
cg_student = CoarseGrainedStudent(pretrained=True).eval()
selector_att = SelectorNetwork(pretrained=True, attention=True).eval() selector_bin = SelectorNetwork(pretrained=True, binarization=True).eval()
* First, extract video features by providing a video tensor to feature extractor (similar as [here](https://github.com/MKLab-ITI/visil/tree/pytorch#use-visil-in-your-python-code))
```python
video_features = feature_extractor(video_tensor)
Use the index_video()
function providing video features to extract video representations for the student and selector networks
fg_features = fg_att_student.index_video(video_features)
cg_features = cg_student.index_video(video_features)
sn_features = selector_att.index_video(video_features)
Use the calculate_video_similarity()
function providing query and target features to calculate similarity based on the student networks.
fine_similarity = fg_att_student.calculate_video_similarity(query_fg_features, target_fg_features)
coarse_similarity = cg_student.calculate_video_similarity(query_cg_features, target_cg_features)
To calculate the selector's score for a video pair, call the selector network by providing the features extracted for each video and their coarse similarity
selector_features = torch.cat([query_sn_features, target_sn_features, coarse_similarity], 1)
selector_scores = selector_att(selector_features)
If you use this code for your research, please consider citing our papers:
@article{kordopatis2022dns,
title={{DnS}: {Distill-and-Select} for Efficient and Accurate Video Indexing and Retrieval},
author={Kordopatis-Zilos, Giorgos and Tzelepis, Christos and Papadopoulos, Symeon and Kompatsiaris, Ioannis and Patras, Ioannis},
journal={International Journal of Computer Vision},
year={2022}
}
@inproceedings{kordopatis2019visil,
title={{ViSiL}: Fine-grained Spatio-Temporal Video Similarity Learning},
author={Kordopatis-Zilos, Giorgos and Papadopoulos, Symeon and Patras, Ioannis and Kompatsiaris, Ioannis},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
year={2019}
}
ViSiL - here you can find our teacher model
FIVR-200K - download our FIVR-200K dataset
This work has been supported by the projects WeVerify and MediaVerse, partially funded by the European Commission under contract number 825297 and 957252, respectively, and DECSTER funded by EPSRC under contract number EP/R025290/1.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details
Giorgos Kordopatis-Zilos (georgekordopatis@iti.gr)