Closed hzrpku closed 5 years ago
请问第一个模型BERT-wwm的轮数epochs和dupe_factor参数呢?
技术报告中有写。
We train 100K steps on the samples with a maximum length of 128, batch size of 2,560, an initial learning rate of 1e-4 (with warm-up ratio 10%). Then, we train another 100K steps on a maximum length of 512 with a batch size of 384 to learn the long-range dependencies and position embeddings.
dupe_factor也是5
谢谢回答! 1.请问输入数据的量的大小有多少呢,就是纯文本,而非tf.record。 2.如果batch size无法达到那么大(2560),请问有什么好的建议吗?
太感谢了!谢谢
还有就是请问你们dupe_factor参数是默认的10吗,谢谢!