ymcui / Chinese-BERT-wwm

Pre-Training with Whole Word Masking for Chinese BERT(中文BERT-wwm系列模型)
https://ieeexplore.ieee.org/document/9599397
Apache License 2.0
9.67k stars 1.39k forks source link

可否提供一下EXT数据的下载地址 #174

Closed bbbxixixixi closed 3 years ago

bbbxixixixi commented 3 years ago

想统计具体的词频,然后用来训练一个加密数据的模型,按照词频高低赋值初始权重。如果训练过程中已经统计过,能直接分享一下就更好了 [1] EXT数据包括:中文维基百科,其他百科、新闻、问答等数据,总词数达5.4B。

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 3 years ago

Closing the issue, since no updates observed. Feel free to re-open if you need any further assistance.