Affective Image Filter: Reflecting Emotions from Text to Images （AIFormer）

This is the official implementation of AIFormer (ICCV 2023)

Abstract

Understanding the emotions in text and presenting them visually is a very challenging problem that requires a deep understanding of natural language and high-quality image synthesis simultaneously. In this work, we propose Affective Image Filter (AIF), a novel model that is able to understand the visually-abstract emotions from the text and reflect them to visually-concrete images with appropriate colors and textures. We build our model based on the multi-modal transformer architecture, which unifies both images and texts into tokens and encodes the emotional prior knowledge. Various loss functions are proposed to understand complex emotions and produce appropriate visualization. In addition, we collect and contribute a new dataset with abundant aesthetic images and emotional texts for training and evaluating the AIF model. We carefully design four quantitative metrics and conduct a user study to comprehensively evaluate the performance, which demonstrates our AIF modeloutperforms state-of-the-art methods and could evoke specific emotional responses from human observers.

Prerequisites

Python 3.8.0
PyTorch 1.13.0
NVIDIA GPU + CUDA cuDNN

Installation

Clone this repo:

https://github.com/zpx0922/AIFormer.git

Install PyTorch and dependencies

http://pytorch.org

Install other python requirements

pip install -r requirement.txt

Dataset

Download the content(COCO2014) datasets.

Download the style (style image) datasets.

Download the description (Affective description) datasets.

Download the affective prior (VAD dictionary) datasets.

Testing

Pretrained models: vgg, embedding, decoder, Transformer, VAD_emb

For a glance of the performance of the AIF model, run the testing codes below.

python test.py --content_dir content_pic --description_dir utterance.txt --output <Path_to_Output> --vgg <Path_to_VGG> --decoder <Path_to_decoder> --Trans <Path_to_transformer> --embedding <Path_to_embedding> --VAD_emb <Path_to_VAD_emb> --VAD_dic <Path_to_VAD_dictionary>

You can place the content image below content_pic and modify the text description in utterance.txt.

Training

Pretraining of Sentiment Vector (SV) models and emotion classification models.

Pretrained models: Sentiment Vector, emotion classification

Use the following codes for training:

python train.py --content_dir <Path_to_COCO2014> --style_dir <Path_to_WIKIART> --affective_ArtEmis <Path_to_Affective_description> --VAD_csv <Path_to_VAD_dictionary> --vgg <Path_to_VGG> --SV <Path_to_Sentiment_Vector> --label <Path_to_emotion_classification> --save_dir <Path_to_save_dir> --log_dir <Path_to_log_dir>

zpx0922 / Affective_Image_Filter

readme