关于第二阶段的Token级别训练细节的问题

shaochenze / PatchTrain

Code for paper "Patch-Level Training for Large Language Models"

Apache License 2.0

71 stars 3 forks source link

Closed jyweky closed 1 month ago

jyweky commented 1 month ago

您好，请问 token 级别的第二阶段训练时是否有 warmup，如果有具体是跑了多少步？看论文中的图2 Loss 曲线不像是有 warmup steps 🤔？

shaochenze commented 1 month ago

第二阶段训练时，除步数外的其他设置基本都保持不变，warmup也是2000步，具体细节可参考run_patch.sh脚本。

jyweky commented 1 month ago

原来如此，感谢您的解惑。