请问你们大概训了多少轮

ymcui / Chinese-BERT-wwm

Pre-Training with Whole Word Masking for Chinese BERT（中文BERT-wwm系列模型）

https://ieeexplore.ieee.org/document/9599397

Apache License 2.0

9.66k stars 1.39k forks source link

请问你们大概训了多少轮 #28

Closed hzrpku closed 5 years ago

hzrpku commented 5 years ago

还有就是请问你们dupe_factor参数是默认的10吗，谢谢！

ymcui commented 5 years ago

BERT-wwm-ext吗？训练第一阶段（最大长度为128）采用的batch size为2560，训练了1M步。训练第二阶段（最大长度为512）采用的batch size为384，训练了400K步。
dupe_factor=5

hzrpku commented 5 years ago

请问第一个模型BERT-wwm的轮数epochs和dupe_factor参数呢？

ymcui commented 5 years ago

技术报告中有写。

We train 100K steps on the samples with a maximum length of 128, batch size of 2,560, an initial learning rate of 1e-4 (with warm-up ratio 10%). Then, we train another 100K steps on a maximum length of 512 with a batch size of 384 to learn the long-range dependencies and position embeddings.

dupe_factor也是5

hzrpku commented 5 years ago

谢谢回答！ 1.请问输入数据的量的大小有多少呢，就是纯文本，而非tf.record。 2.如果batch size无法达到那么大(2560)，请问有什么好的建议吗？

ymcui commented 5 years ago

不带ext的就是中文维基百科训练，带ext的纯文本大小15~20G吧，我没详细统计。
batch size开不大的话可以考虑用gradient accumulation做梯度累计。batch size太小的话的确会影响效果（已在BERT/XLNet等github中得到证实）。

hzrpku commented 5 years ago

太感谢了！谢谢