This repo is the official Pytorch implementation of paper:
"Long-Short Temporal Co-Teaching for Weakly Supervised Video Anomaly Detection"
Please follow the requirements.txt
You can use the pre-trained I3D such pytorch-resnet3d or C3D model for feature extraction.
You can also download the extracted I3D features from links below:
ShanghaiTech I3D features (code:8XJB)ShanghaiTech I3D features (code:KV44)
UCF-Crime I3D features (code:6EB8)UCF-Crime I3D features (code:344D)
UBnormal I3D features (code:PYL5)UBnormal I3D features (code:34A4)
Take the example of ShanghaiTech, run the following commands:
python spatio_transformer_shanghaitech.py --encoder_weight_init --regressor_weight_init --FFN_layerNorm --MHA_dropout 0.3 --FFN_dropout 0.3 --dataset_path SHT_I3D_16PATCH.h5 --gpu 0
Generating the pseudo labels of spatio-transformer:
python pseudo_labels_generator_spatio.py --dataset SHT --n_patch 16 --FFN_layerNorm --threshold 0.9 --pseudo_labels_path STN_pseudo_labels.npy --training_txt SH_Train_new.txt --dataset_path SHT_I3D_16PATCH.h5 --gpu 0
python temporal_transformer_shanghaitech.py --part_len 3 --MHA_layerNorm --FFN_layerNorm --relative_position_encoding --pseudo_labels_path STN_pseudo_labels.npy --dataset_path SHT_I3D_16PATCH.h5 --gpu 0
Generating the pseudo labels of temporal-transformer:
python pseudo_labels_generator_temporal.py --dataset SHT --relative_position_encoding --n_hidden 4096 --n_patch 16 --n_head 8 --d_k 256 --d_v 256 --part_len 3 --MHA_layerNorm --FFN_layerNorm --dataset_path SHT_I3D_16PATCH.h5 --temporal_model_path temporal_model --classifier_model_path classifier_model --pseudo_labels_path LTN_pseudo_labels.npy --training_txt SH_Train_new.txt --threshold 0.65 --gpu 0
For multi-gpu training, you can use the command:
--data_parallel --gpu id0,id1
ShanghaiTech (code:L958)ShanghaiTech (code:3UJ9)
for ShanghaiTech:
python evaluation_shanghaitech_ubnormal.py --dataset SHT --temporal_MHA_layerNorm --temporal_FFN_layerNorm --temporal_relative_position_encoding --dataset_path SHT_I3D_16PATCH.h5 --temporal_model_path shanghaitech_temporal_model_oneCrop_I3D_RGB_0.9779.ckpt --classifier_model_path shanghaitech_classifier_model_oneCrop_I3D_RGB_0.9779.ckpt --gpu 0
for UBnormal:
python evaluation_shanghaitech_ubnormal.py --dataset UBnormal --d_model 1024 --part_len 5 --temporal_MHA_layerNorm --temporal_FFN_layerNorm --temporal_relative_position_encoding --dataset_path UBnormal_I3D_16PATCH.h5 --temporal_model_path UBnormal_temporal_model_oneCrop_I3D_RGB_0.7551.ckpt --classifier_model_path UBnormal_classifier_model_oneCrop_I3D_RGB_0.7551.ckpt --test_mask_dir data/UBnormal/test_frame_mask --training_txt data/UBnormal/train_video_names_frames.txt --testing_txt data/UBnormal/test_video_names_frames.txt --gpu 0
for UCF-Crime:
python evaluation_UCF.py --n_patch 9 --part_num 32 --part_len 2 --dataset_path UCF_I3D_9PATCH.h5 --temporal_MHA_layerNorm --temporal_FFN_layerNorm --temporal_model_path UCF_temporal_model_oneCrop_I3D_RGB_0.8570.ckpt --classifier_model_path UCF_classifier_model_oneCrop_I3D_RGB_0.8570.ckpt --relative_position_encoding --gpu 0
Tips: If the model is trained by multi-gpu mode, you must add the command
--data_parallel
in the inference stage.
This repo is released under the MIT License.
If this repo is useful for your research, please consider citing our paper:
@inproceedings{sun2023long,
title={Long-short temporal co-teaching for weakly supervised video anomaly detection},
author={Sun, Shengyang and Gong, Xiaojin},
booktitle={2023 IEEE International Conference on Multimedia and Expo (ICME)},
pages={2711--2716},
year={2023},
organization={IEEE}
}
Partial codes are based on MIST, we sincerely thank them for their contributions.