questions about Training using train_llm.sh

wphtrying commented 1 month ago

python -u train_math.py --seed 10 \ --dataset_name "prealgebra" \ --dataset_path "../envs/math/data/math_500.jsonl" \ --model_name_or_path "/Qwen2.5-Math-1.5B-Instruct" \ --prm_type "MS" \ --prm_model_name_or_path "/math-shepherd-mistral-7b-prm/math-shepherd-mistral-7b-prm" \ --algorithm_name "APPO" \ --experiment_name "ms_single" \ --num_mini_batch 4 \ --ppo_epoch 1

使用上面命令在910上面训练，action:!!!!!!!!!!!是正常现象吗？还有给的demo训练数据集我理解应该包含step的步骤，为什么只有question和final answer？

更新，看了下代码。这里通过transformer进行推理, prompt前面为什么拼接这个 Let's solve math problems step by step.\n\nJanet's ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?\nStep 1: Janet's ducks lay 16 eggs per day. ки\nStep 2: She eats three for breakfast every morning, so she has 16 - 3 = 13 eggs left. ки\nStep 3: She bakes muffins for her friends every day with four eggs, so she has 13 - 4 = 9 eggs left. ки\nStep 4: She sells the remainder at the farmers' market daily for $2 per fresh duck egg, so she makes 9 * $2 = $18 every day at the farmers' market. ки\nStep 5: The answer is: 18. ки\n\n

morning9393 commented 1 month ago

demo训练数据集为什么只有question和final answer？因为RL训练不需要中间的step，他是根据prm对llm生成的中间step的打分优化的。

输出!!!!!!!!!!!是正常现象吗？这个输出是一开始就这样，还是训练一段时间之后出现的，如果是一开始就这样应该是不正常的，可能跟那个warning有关？如果是训练一段时间后才出现，那可能是过拟合了。。。

wphtrying commented 1 month ago

这块初始化是使用

demo训练数据集为什么只有question和final answer？因为RL训练不需要中间的step，他是根据prm对llm生成的中间step的打分优化的。

输出!!!!!!!!!!!是正常现象吗？这个输出是一开始就这样，还是训练一段时间之后出现的，如果是一开始就这样应该是不正常的，可能跟那个warning有关？如果是训练一段时间后才出现，那可能是过拟合了。。。

demo训练数据集为什么只有question和final answer？因为RL训练不需要中间的step，他是根据prm对llm生成的中间step的打分优化的。

输出!!!!!!!!!!!是正常现象吗？这个输出是一开始就这样，还是训练一段时间之后出现的，如果是一开始就这样应该是不正常的，可能跟那个warning有关？如果是训练一段时间后才出现，那可能是过拟合了。。。

嗯，输出!!!!!是因为910上面transformer推理输出异常了，暂时还没定位到根因

kechunFIVE commented 1 month ago

这块初始化是使用

demo训练数据集为什么只有question和final answer？因为RL训练不需要中间的step，他是根据prm对llm生成的中间step的打分优化的。输出!!!!!!!!!!!是正常现象吗？这个输出是一开始就这样，还是训练一段时间之后出现的，如果是一开始就这样应该是不正常的，可能跟那个warning有关？如果是训练一段时间后才出现，那可能是过拟合了。。。

demo训练数据集为什么只有question和final answer？因为RL训练不需要中间的step，他是根据prm对llm生成的中间step的打分优化的。输出!!!!!!!!!!!是正常现象吗？这个输出是一开始就这样，还是训练一段时间之后出现的，如果是一开始就这样应该是不正常的，可能跟那个warning有关？如果是训练一段时间后才出现，那可能是过拟合了。。。

嗯，输出!!!!!是因为910上面transformer推理输出异常了，暂时还没定位到根因

定位到了吗，按我经验！！！一般是模型权重有问题，比如用了fp16

openreasoner / openr

questions about Training using train_llm.sh #13