Closed huwei02 closed 6 months ago
模型训练与精调
Chinese-LLaMA-2 (7B/13B)
Linux
# 请在此处粘贴运行代码(请粘贴在本代码块里)
运行的是Chinese-LLaMA-2-13B
run_pt.sh中干掉了lora相关配置,但是跑起来,参数还是没有130亿,请问这个是什么原因?怎么处理呢? Number of trainable parameters = 572,784,640
# 请在此处粘贴依赖情况(请粘贴在本代码块里)
sentencepiece 0.1.91
# 请在此处粘贴运行日志(请粘贴在本代码块里)
[INFO|trainer.py:1723] 2023-12-27 11:49:52,215 >> Running training [INFO|trainer.py:1724] 2023-12-27 11:49:52,215 >> Num examples = 445,413 [INFO|trainer.py:1725] 2023-12-27 11:49:52,215 >> Num Epochs = 1 [INFO|trainer.py:1726] 2023-12-27 11:49:52,215 >> Instantaneous batch size per device = 4 [INFO|trainer.py:1729] 2023-12-27 11:49:52,215 >> Total train batch size (w. parallel, distributed & accumulation) = 16 [INFO|trainer.py:1730] 2023-12-27 11:49:52,215 >> Gradient Accumulation steps = 4 [INFO|trainer.py:1731] 2023-12-27 11:49:52,215 >> Total optimization steps = 27,838 [INFO|trainer.py:1732] 2023-12-27 11:49:52,217 >> Number of trainable parameters = 572,784,640
done,已解决
提交前必须检查以下项目
问题类型
模型训练与精调
基础模型
Chinese-LLaMA-2 (7B/13B)
操作系统
Linux
详细描述问题
运行的是Chinese-LLaMA-2-13B
run_pt.sh中干掉了lora相关配置,但是跑起来,参数还是没有130亿,请问这个是什么原因?怎么处理呢? Number of trainable parameters = 572,784,640
依赖情况(代码类问题务必提供)
sentencepiece 0.1.91
运行日志或截图
[INFO|trainer.py:1723] 2023-12-27 11:49:52,215 >> Running training [INFO|trainer.py:1724] 2023-12-27 11:49:52,215 >> Num examples = 445,413 [INFO|trainer.py:1725] 2023-12-27 11:49:52,215 >> Num Epochs = 1 [INFO|trainer.py:1726] 2023-12-27 11:49:52,215 >> Instantaneous batch size per device = 4 [INFO|trainer.py:1729] 2023-12-27 11:49:52,215 >> Total train batch size (w. parallel, distributed & accumulation) = 16 [INFO|trainer.py:1730] 2023-12-27 11:49:52,215 >> Gradient Accumulation steps = 4 [INFO|trainer.py:1731] 2023-12-27 11:49:52,215 >> Total optimization steps = 27,838 [INFO|trainer.py:1732] 2023-12-27 11:49:52,217 >> Number of trainable parameters = 572,784,640