Position embedding matrix Wp was not used in the code?

openai / finetune-transformer-lm

Code and model for the paper "Improving Language Understanding by Generative Pre-Training"

MIT License

2.14k stars 499 forks source link

Closed thanhnguyentang closed 5 years ago

thanhnguyentang commented 5 years ago

Hey, it seems from the code that the position embedding matrix W_p was not used. Am I correct?

h_0 = UW_e +W_p 
h_l = transformer_block(h_{l−1})∀i ∈ [1, n]
P(u) = softmax(h_n W_e^T )

Thank you.

thanhnguyentang commented 5 years ago

I found it is implicitly implemented in transform_roc; the concern is fully addressed, so I close this issue.