zhoubolei / moments_models

The pretrained models trained on Moments in Time Dataset
BSD 2-Clause "Simplified" License
355 stars 68 forks source link

Pretrained models for Moments in Time Dataset

We release the pre-trained models trained on Moments in Time.

Download the Models

Models

We provide a 3D ResNet50 (inflated from 2D RGB model) trained on 16 frame inputs at 5 fps.

The model has been recently updated with 305 classes and the following performance on the MiT-V2 dataset:

Top-1 Top-5
28.4% 54.5%

The 3D model can be downloaded and run using a similar command:

    python test_video.py --video_file path/to/video.mp4 --arch resnet3d50

If you use any of these files please cite our Moments paper (https://arxiv.org/abs/1801.03150).

We now include the Multi-label Moments (M-MiT) 3D Resnet50 Model, Broden dataset with action regions and loss implementations including wLSEP. If you use any of these files please cite our Multi Moments paper (https://arxiv.org/abs/1911.00232).

The multi-label model has been recently updated with 305 classes and the following performance on the M-MiT-V2 dataset:

Top-1 Top-5 micro mAP macro mAP
59.4% 81.7% 62.4 39.4

The 3D M-MiT model can be downloaded and run using the following command:

    python test_video.py --video_file path/to/video.mp4 --arch resnet3d50 --multi

We uploaded a python file with our pytorch implementations of the different loss functions used in our Multi Moments paper (https://arxiv.org/abs/1911.00232).

In order to NetDissect Moments models, download the Broden datasets with action regions:

Clone the TRN repo and Download the pretrained TRN model

git clone --recursive https://github.com/metalbubble/TRN-pytorch
cd TRN-pytorch/pretrain
./download_models.sh
cd ../sample_data
./download_sample_data.sh

Test the pretrained model on the sample video (Bolei is juggling ;-]!)

result

python test_video.py --arch InceptionV3 --dataset moments \
    --weight pretrain/TRN_moments_RGB_InceptionV3_TRNmultiscale_segment8_best.pth.tar \
    --frame_folder sample_data/bolei_juggling

RESULT ON sample_data/bolei_juggling
0.982 -> juggling
0.003 -> flipping
0.003 -> spinning

Reference

Mathew Monfort, Alex Andonian, Bolei Zhou, Kandan Ramakrishnan, Sarah Adel Bargal, Tom Yan, Lisa Brown, Quanfu Fan, Dan Gutfruend, Carl Vondrick, Aude Oliva. Moments in Time Dataset: one million videos for event understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019. pdf, bib

Mathew Monfort, Kandan Ramakrishnan, Alex Andonian, Barry A McNamara, Alex Lascelles, Bowen Pan, Quanfu Fan, Dan Gutfreund, Rogerio Feris, Aude Oliva. Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding. arxiv preprint arXiv:1911.00232, 2019. pdf, bib

Acknowledgements

The project is supported by MIT-IBM Watson AI Lab, IBM Research, the SystemsThatLearn@CSAIL / Ignite Grant and the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior/ Interior Business Center (DOI/IBC) contract number D17PC00341.