Unoffical re-implementation of Event Sequence Selection Network (ESGN) in paper titled "streamlined dense video captioning". Note that we do not adopt SST to encode the proposal-level features, which is different from the original model.
pip install -r requirement.txt
C3D feature. Download C3D feature files (sub_activitynet_v1-3.c3d.hdf5
) from here. Convert the h5 file into npy files and place them into ./data/c3d
.
Download annotation files and pre-generated proposals files (top100 proposals generated by DBG) from Google Drive, and place them into ./data
.
Training
cfg_path=cfgs/esgn.yml
python train.py --cfg_path $cfg_path
the checkpoint files are saved in this folder ./save
.
Validation
python eval.py --eval_folder esgn_c3d_run0
Validation with re-ranking
python eval.py --eval_folder esgn_c3d_run0 --eval_esgn_rerank
Model | proposal model | Avg proposal number | Avg Recall | Avg Precision | F1 | download |
---|---|---|---|---|---|---|
Original ESGN | SST | 2.85 | 55.58 | 57.57 | 56.66 | |
My reimpl. | DBG | 2.73 | 52.67 | 58.90 | 55.62 | url |
My reimpl. with reranking | DBG | 1.66 | 37.66 | 67.47 | 48.33 |
Download the pre-trained model and put it into ./save/esgn_c3d_run0
, then run python eval.py --eval_folder esgn_c3d_run0
.