Closed peiyingxin closed 1 year ago
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your consideration.
Closing the issue, since no updates observed. Feel free to re-open if you need any further assistance.
提交前必须检查以下项目
问题类型
模型训练与精调
基础模型
LLaMA-7B
操作系统
Linux
详细描述问题
请问预训练数据有多少B的 token 呢? 项目中提到用了 120G 中文语料,120G 中文语料大概对应 30~40B token?但是看预训练配置计算的 token 数与这个有很大的出入:total tokens=10245126000=3.1B。
依赖情况(代码类问题务必提供)
运行日志或截图