guolinke / TUPE

Transformer with Untied Positional Encoding (TUPE). Code of paper "Rethinking Positional Encoding in Language Pre-training". Improve existing models like BERT.
MIT License
250 stars 26 forks source link

Fix bug of warnings not imported in mha #14

Closed ZhiyuanChen closed 3 years ago