Concatenating context and embeddings?

openai / finetune-transformer-lm

Code and model for the paper "Improving Language Understanding by Generative Pre-Training"

MIT License

2.15k stars 503 forks source link

Hi,

Congratulations on the paper! Those of us who actually worked on ROCStories know how difficult it is!!

I have a small question on how embeddings are handled in the code.

we = tf.get_variable("we", [n_vocab+n_special+n_ctx, n_embd], initializer=tf.random_normal_initializer(stddev=0.02))
e = tf.gather(we, X)
h = tf.reduce_sum(e, 2)

I believe this is equivalent to embeddings_look_up() that people normally use...so we is word embedding. My question is: what is n_ctx (context embedding)? May I ask how is this used in the model?

Thank you very much!

Now that I looked at the code closer, is it an artifact of the Transformer Decoder??

openai / finetune-transformer-lm

Concatenating context and embeddings? #4