GPT vs BERT, under same computation and data resource, which one is better for downstream tasks like GLUE?

zihangdai / xlnet

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Apache License 2.0

6.16k stars 1.18k forks source link

GPT vs BERT, under same computation and data resource, which one is better for downstream tasks like GLUE? #276

Open guotong1988 opened 3 years ago

guotong1988 commented 3 years ago

Thank you very much.

LifeIsStrange commented 3 years ago

@guotong1988 generally speaking, XLnet is the best pretrained model, period. The original implementation that you can find on this repository is abandonware which is sad. You should use https://huggingface.co/transformers/model_doc/xlnet.html