Boosting Multi-modal Model Performance with Adaptive Gradient Modulation

Here is the official Pytorch implementation of AGM proposed in "Boosting Multi-modal Model Performance with Adaptive Gradient Modulation".

Paper Title: Boosting Multi-modal Model Performance with Adaptive Gradient Modulation

Authors: Hong Li, Xingyu Li , Pengbo Hu, Yinuo Lei, Chunxiao Li, Yi Zhou

Accepted by: ICCV 2023

[arXiv] [ICCV Proceedings]

Dataset

1. AV-MNIST

This dataset can be downloaded from here.

2. CREMA-D

This dataset can be downloaded from here. Data preprocessing can refer to here.

3. UR-Funny

This raw dataset can be downloaded from here. Also, the processed data can be obtained from here.

4. AVE

This dataset can be downloaded from here.

5. CMU-MOSEI

This dataset can be downloaded from here.

Training

Environment config

Python: 3.9.13
CUDA Version: 11.3
Pytorch: 1.12.1

Torchvision: 0.13.1

Train

To train the model using the following command:

python main.py --data_root '' --device cuda:0 --methods Normal --modality Multimodal --fusion_type late_fusion --random_seed 999 --expt_dir checkpoint --expt_name test --batch_size 64 --EPOCHS 100 --learning_rate 0.0001 --dataset AV-MNIST --alpha 2.5 --SHAPE_contribution False

Citation

@inproceedings{li2023boosting,
  title={Boosting Multi-modal Model Performance with Adaptive Gradient Modulation},
  author={Li, Hong and Li, Xingyu and Hu, Pengbo and Lei, Yinuo and Li, Chunxiao and Zhou, Yi},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={22214--22224},
  year={2023}
}

lihong2303 / AGM

readme

Boosting Multi-modal Model Performance with Adaptive Gradient Modulation

Dataset

1. AV-MNIST

2. CREMA-D

3. UR-Funny

4. AVE

5. CMU-MOSEI

Training

Environment config

Train

Citation