jiahe7ay / MINI_LLM

This is a repository used by individuals to experiment and reproduce the pre-training process of LLM.
348 stars 53 forks source link

pretrain阶段的验证集数据是如何选取的呢? #7

Closed moyans closed 7 months ago

moyans commented 7 months ago

EVAL_FILE = './datasets/pretrain_eval_512_1w.parquet'

jiahe7ay commented 7 months ago

数据构造的脚步里有个gen_bell函数可以生成验证数据 我readme忘记写了