zihangdai / xlnet

XLNet: Generalized Autoregressive Pretraining for Language Understanding
Apache License 2.0
6.16k stars 1.18k forks source link

Is real factorization? #253

Open fangwch opened 4 years ago

fangwch commented 4 years ago

I find it seems that in __localperm function, every token in non_mask_tokens will attend to all non_masktokens, details pls refer to perm_mask_.