lionel-hing / BiC-Net

BiC-Net: Learning Efficient Spatio-Temporal Relation for Text-Video Retrieval
27 stars 2 forks source link

BiC-Net: Learning Efficient Spatio-Temporal Relation for Text-Video Retrieval

This is our implementation for the paper:

Ning Han, Jingjing Chen, Chuhao Shi, Yawen Zeng, Guangyi Xiao, and Hao Chen. 2022. BiC-Net: Learning Efficient Spatio-Temporal Relation for Text-Video Retrieval.

Environment Settings

We use the framework pytorch.

Training

You can also follow the instruction below to train your own model.

Run train.py to train and save models:

python train.py --cuda --is_train --dataset=msr-vtt --data_split=9000 --layer_num=4 --log_dir=./data/runs/xxx --dataroot=./data/MSR-VTT 

Evaluation

run eval.py to evaluate models:

python eval.py --cuda --checkpoint= ./models/ckpt_best.pth

Example to get the results

There are a lot of experimental records in the ./data/runs/xxx

Dataset

We provide three datasets that we used in our paper: MSR-VTT, MSVD, YouCook2. Download the processed video and text features of MSR-VTT(code:pbvc), MSVD(code:5p0y), and YouCook2_BB, and save them in /data folder.

Last Update Date: May 29, 2022