about this paper

Author: Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou & Yoshua Bengio Link: https://arxiv.org/pdf/1703.03130.pdf

Embedding層の入力をベクトルではなく、各列が文章の特定部分を表す行列にしてattentionかけたら、色々なタスク(この論文ではauthor profiling, sentiment classification, textual entailmentのタスクでモデルを評価)での精度を上げることができるという手法。

What are problems they want to solve?

problems

多くのタスクでCNNやLSTMを用いたモデルを提案しているが、sentiment classificationの様なタスクにおいては、余分の情報を全く持っていないCNNだけのモデル・LSTMだけのモデルだとうまくいかない。そこで、もっと一般的に使われる手法は、max poolingや全てのtime stepの平均値をとるか(Lee & Dernoncourt., 2016)、encoded embeddingの様に単純に最後のstepの隠れ表現をとることである(Margarit & Subramaniam., 2016)。特に、以上の様な手法は、RNNの最終隠れ層やmax poolingすることで得られる簡単なベクトルを用いる。この論文は、このような手法が単語や文章の意味的情報をすべてのtime stepの中で持ち続けるのはかなり難しく、またその必要もないという仮定をたてた。

how to solve

max poolingや平均を求めることの代わりにself-attentionをかける手法を提案する。 attention層はLSTMの直後に置くことで、別途の入力がない時でも、いい性能を見せることができる。また、Embeddingしたものを即bi-LSTM層に渡すため、LSTMからのlong-term memorization負担を減らすことができる。

model architecture

提案するモデルは：１）bi-LSTM ２）self-attention

S = 文章(すべての単語embeddingをコンキャットした行列) n = tokenの数 S = ( w_1, w_2, ... , w_n ), S \in D^{n \times d} w_i = d次元の単語embeddingに入れるベクトル u = 各undirectional LSTMのための隠れunitの数字 H = (h_1, h_2, ... , h_n), H \in D^{n \times 2u}

著者らの目標は、異なる長さの文章を固定長のembeddingに入れることなので、Hの中のn LSTM隠れベクトルの線型結合を行う。 W_{s1} \in D^{da \times 2u} W{s2} \in D^{d_a} d_a = ハイパーパラメータ a = annotationベクトル(サイズはn) m = softmaxで得られるベクトル表現であり、文章の特定部分(関連のある単語の集まりや句)を表すので、意味情報やaspectを教えてくれるはず。

magicpieh28 / Paper-Summary

A structured Self-Attentive Sentence Embedding(2017) - uncompleted #7

about this paper

What are problems they want to solve?

problems

how to solve

model architecture