hendrycks / emodiversity

Wellbeing and Emotion Prediction (NeurIPS 2022)
MIT License
8 stars 0 forks source link

Emodiversity

This is the repository for "How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios" by Mantas Mazeika*, Eric Tang*, Andy Zou, Steven Basart, Jun Shern Chan, Dawn Song, David Forsyth, Jacob Steinhardt, and Dan Hendrycks*.

The Video Cognitive Empathy (VCE) and Video to Valence (V2V) datasets are available here.

This repository contains submodules. To clone the full repository along with submodules (required for reproducing training/results), please use

git clone --recurse-submodules https://github.com/hendrycks/emodiversity.git

Video Cognitive Empathy (VCE) dataset

The VCE dataset contains 61,046 videos, each labelled for the intensity of 27 emotions. The dataset is split into

  1. train: 50,000 videos
  2. test: 11,046 videos

The dataset is structured as follows:

vce_dataset/
├── metadata.json
├── train_labels.json
├── test_labels.json
├── videos/00000.mp4
├── videos/00001.mp4
...

where

Video to Valence (V2V) Dataset

The V2V dataset consists preference annotations over a total of 26,670 videos, split into:

Note that train, test, listwise videos are mutually exclusive.

The dataset has the form:

v2v_dataset/
├── metadata.json
├── train_labels.json
├── test_labels.json
├── listwise_labels.json
├── videos/00000.mp4
├── videos/00001.mp4
...

where

Reproducibility

VideoMAE

To reproduce our results with the VideoMAE models, make sure you have our fork of the VideoMAE repository in this repository as a submodule. If Emodiversity/VideoMAE does not exist, you can run the following command from the root of this repository to get it:

git submodule update --init

Pretrained models

We finetune our models on top of the VideoMAE models pretrained on the Kinetics-400 dataset. Download the pretrained model "Kinetics-400, ViT-B, Epoch 1600, Pre-train checkpoint" from this page and save it to emodiversity/VideoMAE/models/kinetics400-ViTB-1600-16x5x3-pretrain.pth.

If you have gdown installed, this command does the above for you:

gdown 1tEhLyskjb755TJ65ptsrafUG2llSwQE1 --output emodiversity/VideoMAE/models/kinetics400-ViTB-1600-16x5x3-pretrain.pth

Fine-tuning models

To finetune the VideoMAE models on our dataset, update the relevant filepaths in the VideoMae/scripts/finetune_vce.sh and VideoMae/scripts/finetune_v2v.sh scripts to match your system, and run them:

bash VideoMAE/scripts/emodiversity/finetune_vce.sh

and

bash VideoMAE/scripts/emodiversity/finetune_v2v.sh

In practice, we use sbatch scripts with a SLURM cluster to train our models. If you would like to replicate this, please refer to finetune_vce.sbatch and finetune_v2v.sbatch.

Citation

If you find this useful in your research, please consider citing

@article{hendrycks2022wellbeing,
  title={How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios},
  author={Mantas Mazeika and Eric Tang and Andy Zou and Steven Basart and Jun Shern Chan and Dawn Song and David Forsyth and Jacob Steinhardt and Dan Hendrycks},
  journal={NeurIPS},
  year={2022}
}