This repository provides the official PyTorch implementation of the research paper:
Token Level Contrastive Learning with Modality Aware Prompting for Multimodal Intent Recognition (Accepted by AAAI2024).
In multimodal real-world scenarios, humans interpret intents using modalities like expressions, body movements, and speech tone. Yet, most methods neglect the links between modalities and fail in learning nonverbal semantic features. Our paper presents the TCL-MAP approach, leveraging video and audio-based prompts to enhance text representation, thereby guiding the learning of semantic features across modalities.
We use anaconda to create python environment:
conda create --name python=3.6
Install all required libraries:
pip install -r requirements.txt
The data can be downloaded through the following links:
https://drive.google.com/file/d/16f1SOamp_hRuqRqH37eophnSQr1Dl2_w/view?usp=sharing # MIntRec
https://drive.google.com/file/d/1Pn-Tqok36goVdJtuxzx4fsEP0aVjeKRb/view?usp=sharing # MELD
You can evaluate the performance of our proposed TCL-MAP on MIntRec and MELD-DA by using the following commands:
sh examples/run_TCL_MAP_MIntRec.sh
sh examples/run_TCL_MAP_MELD.sh
You can change the parameters in the configs folder. The default parameter is the best parameter on two datasets
The overview model architecture:
The details of MAP module:
If you are insterested in this work, and want to use the codes or results in this repository, please star this repository and cite by:
@misc{zhou2023tokenlevel,
title={Token-Level Contrastive Learning with Modality-Aware Prompting for Multimodal Intent Recognition},
author={Qianrui Zhou and Hua Xu and Hao Li and Hanlei Zhang and Xiaohan Zhang and Yifan Wang and Kai Gao},
year={2023},
eprint={2312.14667},
archivePrefix={arXiv},
primaryClass={cs.MM}
}
Some of the codes in this repo are adapted from MIntRec, and we are greatly thankful.
If you have any questions, please open issues and illustrate your problems as detailed as possible.