about this paper

Author: Adam Poliak, Pushpendre Rastogi, M.Patrick Martin, Benjamin Van Durme Link: http://www.aclweb.org/anthology/E17-2081

on-the-flyのような句をEmbeddingするための手法を紹介する論文。 n-gramを用いる。

What are problems they want to solve?

problems

句をembeddingするためには、句を一つの単語扱いする必要がある。そして、一つの文章をどう前処理すればいいのかを工夫してからword2vecを実行する必要がある。しかし、訓練中に現れていない表現に関してはn-gram embeddingができない。
単語embeddingの平均を求めたり足したり掛けたりするheuristicな手法を使い、各単語embeddingを一つの新しいembeddingに統合する手法もある(Mitchell and Lapata, 2010)。しかし、単語や句の中の文字の順序は全く掴み出せない。
- e.g. shark-killerとkiller-sharkが区別できない。
  how to solve
  
  skip-embedding手法を用いる。
  
  model architecture
  
  単語はよく見ると、いくつかに分類できる。特に、各単語は少なくとも2c文脈に分けることができて、windowの中の各位置の文脈はそれぞれ考慮すべきである。

s = sequence of words s_j = jth word of sequence s |s| = the length of the sequence S = the set of all sequences W = indexed set of words w = generic word wi = ith word of W V, V{out} = indexed sets of vectors of length d corresponding to W

v \in V, v{out} \in V{out}
V = input representations of a ward as described by Mikolov et al. (2013b)
V = output representations "

v_w = vector representing word w \in W 各単語wを2c(bi-directional) embeddingのパラメタにすることができる。つまり、全ての0ではないi \in [-c : c]に対して、i番目のv_wは、特定の位置に出現している単語wをwから左(-)や右(+)にencodeする。よって、word2vecで用いられるように一つのモデルに対しaverage log-probabilityを計算するより、2cの独立的なモデルに以下の目的関数を適用する。

s_k = the word i positions away from s_j in s

新しい確率分布は以下のようになる。

word2vecがd次元を用いるに対し、skip-embeddingsはd/2c分だけのパラメタだけを持つ。また、2c embeddingsはそれぞれ生成されるため、訓練がparallelにできる。

skip-embeddingsを作り終えたら、訓練時に現れたもしくは現れなかった句をn-gram embeddingすることができる。unigram embeddingを作るために2c embeddingsをコンキャットする。

v^L_{[w1 : wn]} = left context of the n-gram
v^R_{[w1 : wn]} = right context of the n-gram : order sensitive heuristic to combine skip-embeddings to embed n-grams.

よって、unigram embeddingの次元はd/cになる。

experiment

dataset

英語のWikipediaから1億1.1千個以上の4token以下の文だけを抽出した。

20億個以上の単語を含んでいる。 target phrases from paraphrase database(PPDB). evaluation set: 279 source phrases.
137 source phrases from PPDB's extra-extra-large phrasal subset
142 source phrases from PPDB's extra-extra-large lexical subset