jiahe7ay / MINI_LLM

This is a repository used by individuals to experiment and reproduce the pre-training process of LLM.
327 stars 52 forks source link

没有找到data_process.py文件 #1

Closed Yaoisss closed 6 months ago

Yaoisss commented 6 months ago

+++++++++++++++++ ···

  1. 切换到dataset_utils目录下运行generate_data.py,运行前修改py文件,将处理数据的函数的注释去掉,才能运行起来
  2. 运行data_process.py,在./datasets/目录下生成parquet文件 cd dataset_utils python data_process.py ··· +++++++++++++++++

这个data_process.py没有找到,看了一下源码是不是generate_data.py?

jiahe7ay commented 6 months ago

是的 我写错了

jiahe7ay commented 6 months ago

已修改README了 谢谢指正