guolinke / TUPE

Transformer with Untied Positional Encoding (TUPE). Code of paper "Rethinking Positional Encoding in Language Pre-training". Improve existing models like BERT.
MIT License
249 stars 26 forks source link

有没验证过,是pretrain阶段的增益,还是本身这个position-encoding就有增益(不pretrain也有)? #7

Closed guotong1988 closed 3 years ago

guotong1988 commented 3 years ago

多谢多谢!多谢多谢! @guolinke

guolinke commented 3 years ago

learnable positional encoding 不经过训练的话,我个人觉得是不行的。必须有pretrain

guotong1988 commented 3 years ago

哦哦,我没理解,,

guotong1988 commented 3 years ago

可是仅用fine-tune的数据不也训练了positional encoding吗

guolinke commented 3 years ago

@guotong1988 finetune 数据量太小,学不出太多东西。原始的bert也需要在pretraining 学position。 另外,position和token应该就是需要一起学的,它们互相补充,刻画了不一样的信息。不可能说pretrain只学token,后面finetune 的时候position随便换。