Closed guotong1988 closed 3 years ago
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Closing the issue, since no updates observed. Feel free to re-open if you need any further assistance.
The BERT is described in the paper 《BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding》. The RoBERTa is described in the paper 《RoBERTa: A Robustly Optimized BERT Pretraining Approach》. Now 3 years past. Are there any pretrained-language-model that surpass them in most of the tasks? (Under the same or nearby resources) Speedup without accuracy decreasing is also considered as a better one.