openai / finetune-transformer-lm

Code and model for the paper "Improving Language Understanding by Generative Pre-Training"
https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf
MIT License
2.15k stars 503 forks source link

Concatenating context and embeddings? #4

Closed windweller closed 6 years ago

windweller commented 6 years ago

Hi,

Congratulations on the paper! Those of us who actually worked on ROCStories know how difficult it is!!

I have a small question on how embeddings are handled in the code.

we = tf.get_variable("we", [n_vocab+n_special+n_ctx, n_embd], initializer=tf.random_normal_initializer(stddev=0.02))
e = tf.gather(we, X)
h = tf.reduce_sum(e, 2)

I believe this is equivalent to embeddings_look_up() that people normally use...so we is word embedding. My question is: what is n_ctx (context embedding)? May I ask how is this used in the model?

Thank you very much!


Now that I looked at the code closer, is it an artifact of the Transformer Decoder??

windweller commented 6 years ago

I always thought the positional encoding (sine wave) is concatenated to the word embedding...but turns out it's summed...