guanfuchen / cvpr_review

整理cvpr论文,包括摘要,动机,架构,结果,总结
26 stars 4 forks source link

Graph-Structured Representations for Visual Question Answering #1

Open guanfuchen opened 5 years ago

guanfuchen commented 5 years ago
id title author year
1 Graph-Structured Representations for Visual Question Answering Teney, Damien and Liu, Lingqiao and van den Hengel, Anton 2017
guanfuchen commented 5 years ago
摘要
This paper proposes to improve visual question answering (VQA) with structured representations of both scene contents and questions. A key challenge in VQA is to require joint reasoning over the visual and text domains. The predominant CNN/LSTM-based approach to VQA is limited by monolithic vector representations that largely ignore structure in the scene and in the question. CNN feature vectors cannot effectively capture situations as simple as multiple object instances, and LSTMs process questions as series of words, which do not reflect the true complexity of language structure. We instead propose to build graphs over the scene objects and over the question words, and we describe a deep neural network that exploits the structure in these representations. We show that this approach achieves significant improvements over the state-of-the-art, increasing accuracy from 71.2% to 74.4% on the “abstract scenes” multiple-choice benchmark, and from 34.7% to 39.1% for the more challenging “balanced” scenes, i.e. image pairs with fine-grained differences and opposite yes/no answers to a same question.
guanfuchen commented 5 years ago

image

image

guanfuchen commented 5 years ago

results

image

image

guanfuchen commented 5 years ago

conclusions

image

guanfuchen commented 5 years ago

概述进度

概述人 校对人
@guanfuchen X