wisteria2gp / DataScience_survey

0 stars 0 forks source link

Deep Modular Co-Attention Networks for Visual Question Answering #31

Open wisteria2gp opened 4 years ago

wisteria2gp commented 4 years ago

論文リンク

https://arxiv.org/abs/1906.10770

著者/所属機関

Zhou Yu, Jun Yu, Yuhao Cui, Dacheng Tao, Qi Tian

投稿日付(yyyy/MM/dd)

2019/06/25

Abst

原文

Visual Question Answering (VQA) requires a fine-grained and simultaneous understanding of both the visual content of images and the textual content of questions. Therefore, designing an effective `co-attention' model to associate key words in questions with key objects in images is central to VQA performance. So far, most successful attempts at co-attention learning have been achieved by using shallow models, and deep co-attention models show little improvement over their shallow counterparts. In this paper, we propose a deep Modular Co-Attention Network (MCAN) that consists of Modular Co-Attention (MCA) layers cascaded in depth. Each MCA layer models the self-attention of questions and images, as well as the guided-attention of images jointly using a modular composition of two basic attention units. We quantitatively and qualitatively evaluate MCAN on the benchmark VQA-v2 dataset and conduct extensive ablation studies to explore the reasons behind MCAN's effectiveness. Experimental results demonstrate that MCAN significantly outperforms the previous state-of-the-art. Our best single model delivers 70.63% overall accuracy on the test-dev set. Code is available at this https URL.

日本語でひとこと

CVPR2019のVQA Challengeの優勝チームによるPaper。 Self-AttentionとGuided-Attentionを単位ユニットとして構成したMCA layerの積層ネットワークMCANは、既存モデルと比較して、画像と文のモーダル間・内のattentionを深くかけて精度を向上させることに成功している。

Attentionというある程度標準化されたモジュールを用いた構造でSOTAを達成している点はなかなかに鮮やか。

新規性・差分

VQAのMulti-modal feature fusion におけるco-attentionモジュールにTransformer的構造を導入した。

手法

MCA layerの積層構造はAttentionの重ねがけそのものなためTransformer構造をマルチモーダルに対してそのまま応用したような形である。

結果

コメント

LXMERT: Learning Cross-Modality Encoder Representations from Transformers

なんかこの論文も主張が似ているような...