预训练数据的细节问题

jiahe7ay / MINI_LLM

This is a repository used by individuals to experiment and reproduce the pre-training process of LLM.

348 stars 53 forks source link

Open awdrgyjilplij opened 6 months ago

awdrgyjilplij commented 6 months ago

jiahe7ay commented 6 months ago

其实也没有什么优势不过我最开始只是想训练wiki和百度而已然后后面看到loss下降还行就继续训练天工了第二阶段是纯训练天工数据

awdrgyjilplij commented 6 months ago

以及我看预训练和sft都没用到attention_mask，这是为什么，不会影响模型对pad的理解吗

jiahe7ay commented 6 months ago

你不输入attention_mask qwen会自动帮你生成