roberta预训练数据 - Githubissues

ymcui / Chinese-BERT-wwm

Pre-Training with Whole Word Masking for Chinese BERT（中文BERT-wwm系列模型）

https://ieeexplore.ieee.org/document/9599397

Apache License 2.0

9.67k stars 1.39k forks source link

roberta预训练数据 #181

Closed Daemon-ser closed 3 years ago

Daemon-ser commented 3 years ago

请问roberta的预训练数据是全都512长句，还是说像bert一样有10%的短句？

ymcui commented 3 years ago

全部是长句，并没有混合长短。 https://github.com/ymcui/Chinese-BERT-wwm#模型对比

Daemon-ser commented 3 years ago

好的，谢谢

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 3 years ago

Closing the issue, since no updates observed. Feel free to re-open if you need any further assistance.