yliu-cs / CVLA

[ICMR'24] Comment-aided Video-Language Alignment via Contrastive Pre-training for Short-form Video Humor Detection
https://dl.acm.org/doi/10.1145/3652583.3658094
4 stars 0 forks source link

๐ŸŽฌ Comment-aided Video-Language Alignment via Contrastive Pre-training for Short-from Video Humor Detection

Comment-aided Video-Language Alignment via Contrastive Pre-training for Short-from Video Humor Detection

Yang Liu, Tongfei Shen, Dong Zhang, Qingying Sun, Shoushan Li and Guodong Zhou

The growing importance of multi-modal humor detection within affective computing correlates with the expanding influence of short-form video sharing on social media platforms. In this paper, we propose a novel two-branch hierarchical model for short-form video humor detection (SVHD), named Comment-aided Video-Language Alignment (CVLA) via data-augmented multi-modal contrastive pre-training. Notably, our CVLA not only operates on raw signals across various modal channels but also yields an appropriate multi-modal representation by aligning the video and language components within a consistent semantic space. The experimental results on two humor detection datasets, including DY11k and UR-FUNNY, demonstrate that CVLA dramatically outperforms state-of-the-art and several competitive baseline approaches. Our dataset, code and model are available here.

๐Ÿ  Preparations

Clone this reposity:

git clone https://github.com/yliu-cs/CVLA.git
cd CVLA

The directory tree is shown as below:

CVLA
โ”œโ”€โ”€ dataset
โ”‚   โ”œโ”€โ”€ labeled
โ”‚   โ”‚   โ”œโ”€โ”€ 6557952865950764295
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ video.mp4
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ comment.json
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ info.json
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ audio.wav
โ”‚   โ”‚   โ””โ”€โ”€ ...
โ”‚   โ”œโ”€โ”€ unlabeled
โ”‚   โ”‚   โ”œโ”€โ”€ 6937837057259621664
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ ...
โ”‚   โ”‚   โ””โ”€โ”€ ...
โ”‚   โ”œโ”€โ”€ split
โ”‚   โ”‚   โ””โ”€โ”€ train
โ”‚   โ””โ”€โ”€ vid2en.pkl
โ”œโ”€โ”€ plot
โ”‚   โ”œโ”€โ”€ attention.py
โ”‚   โ”œโ”€โ”€ duration.py
โ”‚   โ”œโ”€โ”€ like.py
โ”‚   โ”œโ”€โ”€ loss.py
โ”‚   โ”œโ”€โ”€ Theme.md
โ”‚   โ””โ”€โ”€ ...
โ”œโ”€โ”€ tools
โ”‚   โ”œโ”€โ”€ gather_result.py
โ”‚   โ”œโ”€โ”€ split_dataset.py
โ”‚   โ””โ”€โ”€ translate.py
โ”œโ”€โ”€ run.sh
โ”œโ”€โ”€ run.py
โ”œโ”€โ”€ param.py
โ”œโ”€โ”€ models.py
โ”œโ”€โ”€ data.py
โ”œโ”€โ”€ README.md
โ””โ”€โ”€ requirements.txt

Download our proposed dataset DY11k from DY11k Download URL, and unzip it into dataset folder.

โš™๏ธ Requirements

To run our code, please install all the dependency packages by using the following command:

conda create -n CVLA python=3.10
conda activate CVLA
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116
pip install -r requirements.txt

Note: Different versions of packages (like pytorch, transformers, etc.) may lead to different results from the paper. However, the trend should still hold no matter what versions of packages you use.

๐ŸŽฎ Multiple data splits

To carry out experiments with multiple data splits (aggregate the results over 5 different seeds), you can use the following scripts:

for seed in 2 42 327 2023 998244353
do
    python tools/split_dataset.py --seed $seed
done

๐Ÿš€ Experiments with multiple runs

Multiple model training give a more robust measure of performance and a better estimate of the standard, you can use the following scripts:

for seed in 2 42 327 2023 998244353
do
    python run.py --seed=$seed
done

Then run the following command to gather all the results:

python tools/gather_result.py > gathered_result.log

๐Ÿ“ง Bugs or questions?

If you have any questions related to the code or the paper, feel free to email Yang (yliu.cs.cn@gmail.com). If you encounter any problems when using the code, or want to report a bug, you can open an issue. Please try to specify the problem with details so we can help you better and quicker!

โค๏ธ Acknowledgment

Thanks Clover, TVLT, nanoGPT and thop for their excellent code implementations, which aided later study and are referenced in this implementation as available source code.

๐Ÿ“œ Citation

Please cite our paper if you use CVLA in your work:

@inproceedings{conf/icmr/Liu24CVLA,
  author       = {Yang Liu and Tongfei Shen and Dong Zhang and Qingying Sun and Shoushan Li and Guodong Zhou},
  title        = {Comment-aided Video-Language Alignment via Contrastive Pre-training for Short-form Video Humor Detection},
  booktitle    = {Proc. of ICMR},
  pages        = {442--450},
  year         = {2024}
}