ymcui / Chinese-BERT-wwm

Pre-Training with Whole Word Masking for Chinese BERT(中文BERT-wwm系列模型)
https://ieeexplore.ieee.org/document/9599397
Apache License 2.0
9.56k stars 1.38k forks source link

EXT数据集的量有多少 #218

Closed xueyuan1990 closed 2 years ago

xueyuan1990 commented 2 years ago

项目说明中写着EXT数据总词数达5.4B 但我并不清楚这里的B是什么意思

ymcui commented 2 years ago

1)B就是billion。 2)按占用磁盘空间是20G的数据。