Closed greatiliad closed 9 months ago
deepspeed使用的是zero3策略?
似乎是epoc设置的太少了,如果我epoc设置100,去掉modules_to_save ,结果大约有300MB,如果加上modules_to_save,结果大约1.2GB。这个量是不是算正常的?
正常
一般做LORA, --num_train_epochs 这个参数设置多少为宜?
与数据量、学习率等有关,还需调优,没有明确的设置。
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your consideration.
Closing the issue, since no updates observed. Feel free to re-open if you need any further assistance.
提交前必须检查以下项目
问题类型
模型训练与精调
基础模型
Chinese-Alpaca-2 (7B/13B)
操作系统
Linux
详细描述问题
训练结束sft_lora_model只有几百KB CheckPoint中adapter_model.bin也是空的 这是CheckPoint中的文件情况
依赖情况(代码类问题务必提供)
运行日志或截图