The implementation for NeurIPS 2023 paper of "Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval.". It is built on top of the CLIP4Clip and the X-CLIP.
2024.3.9: The code of Image-text Retrieval has been updated. You can find it in directory Image-Text-Retrieval/
.
We recommend the following dependencies.
pip install requirments.txt
We follow the same split provided by CLIP4Clip. You can follow the guide of its Data preparing.
The official data and video can be found here.
You can download the splits and captions by:
wget https://github.com/ArrowLuo/CLIP4Clip/releases/download/v0.0/msrvtt_data.zip
The raw videos can be found here.
You can download the splits and captions by:
wget https://github.com/ArrowLuo/CLIP4Clip/releases/download/v0.0/msvd_data.zip
The raw videos can be found here. The splits can be found here
We provide the trained model files for evaluation. You can download the model trained on MSRVTT here, model trained on MSVD here, and model trained on DiDeMo here.
Please edit the ${DATA_PATH}
to the path of your dataset, and the ${SAVE_PATH}
to the path of the checkpoints where to save.
Tips: ${do_rerank_learn}
indicates whether to automatically learn the beta parameter of the model after each model training, which will take a longer time. You can remove it if you wish to speed up the validate process.
MSR-VTT
sh scripts/run_msrvtt.sh
MSVD
sh scripts/run_msvd.sh
DiDeMo
sh scripts/run_didemo.sh
If you want to get the best beta parameters of re-ranking (may take more time). Please edit the ${DATA_PATH}
to the path of your dataset, and the ${SAVE_PATH}
to the path of the checkpoints where to save.
You can freely construct a beta learning set, but it is preferable that the data within it have not been used in previous model training process. Here, we default to using the validation set as the beta learning set.
MSR-VTT
sh scripts/run_msrvtt_learn.sh
MSVD
sh scripts/run_msvd_learn.sh
DiDeMo
sh scripts/run_didemo_learn.sh
Please edit the ${DATA_PATH}
to the path of your dataset, the ${SAVE_PATH}
to the path of the checkpoints where to save, and the ${MODEL_PATH}
to the path of the checkpoints to be loaded. ${rerank_coe_v}
and ${rerank_coe_t}
are the rerank parameters ($\beta_1$, $\beta_2$) obtained in the Beta Learning Process.
MSR-VTT
sh scripts/run_msrvtt_eval.sh
MSVD
sh scripts/run_msvd_eval.sh
DiDeMo
sh scripts/run_didemo_eval.sh
If you found this code useful, please cite the following paper:
@inproceedings{PAU,
author = {Hao Li and
Jingkuan Song and
Lianli Gao and
Xiaosu Zhu and
Heng Tao Shen},
title = {Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval},
booktitle = {NeurIPS},
year = {2023}
}