Congratulations on the paper! Those of us who actually worked on ROCStories know how difficult it is!!
I have a small question on how embeddings are handled in the code.
we = tf.get_variable("we", [n_vocab+n_special+n_ctx, n_embd], initializer=tf.random_normal_initializer(stddev=0.02))
e = tf.gather(we, X)
h = tf.reduce_sum(e, 2)
I believe this is equivalent to embeddings_look_up() that people normally use...so we is word embedding. My question is: what is n_ctx (context embedding)? May I ask how is this used in the model?
Thank you very much!
Now that I looked at the code closer, is it an artifact of the Transformer Decoder??
Hi,
Congratulations on the paper! Those of us who actually worked on ROCStories know how difficult it is!!
I have a small question on how embeddings are handled in the code.
I believe this is equivalent to
embeddings_look_up()
that people normally use...sowe
is word embedding. My question is: what isn_ctx
(context embedding)? May I ask how is this used in the model?Thank you very much!
Now that I looked at the code closer, is it an artifact of the Transformer Decoder??