What are the pretrained-language-model that is obviously better than BERT and RoBERTa?

ymcui / Chinese-BERT-wwm

Pre-Training with Whole Word Masking for Chinese BERT（中文BERT-wwm系列模型）

https://ieeexplore.ieee.org/document/9599397

Apache License 2.0

9.67k stars 1.39k forks source link

What are the pretrained-language-model that is obviously better than BERT and RoBERTa? #168

Closed guotong1988 closed 3 years ago

guotong1988 commented 3 years ago

The BERT is described in the paper 《BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding》. The RoBERTa is described in the paper 《RoBERTa: A Robustly Optimized BERT Pretraining Approach》. Now 3 years past. Are there any pretrained-language-model that surpass them in most of the tasks? (Under the same or nearby resources) Speedup without accuracy decreasing is also considered as a better one.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 3 years ago

Closing the issue, since no updates observed. Feel free to re-open if you need any further assistance.