预训练初始模型 - Githubissues

jiahe7ay / MINI_LLM

This is a repository used by individuals to experiment and reproduce the pre-training process of LLM.

348 stars 53 forks source link

支持，你修改config文件即可调整模型结构感谢！ follow 你的代码跑通了训练，但是看 loss 有点异常，如图，loss 陡然就变成了 0，不知道你是否遇到过这种情况，感觉是训练哪里出了问题。具体训练数据用的是 Readme 中提供的 563w_baidubaike.json，机器用的两张 T4，调整了部分参数 num_hidden_layers=6，per_device_train_batch_size=2，gradient_accumulation_steps=60，用的 fp16，没用 flash_att，其他代码和参数都未做修改。

jiahe7ay / MINI_LLM

预训练初始模型 #20