microsoft / DeepSpeedExamples

Example models using DeepSpeed
Apache License 2.0
5.83k stars 987 forks source link

The reward value did not increase. #883

Open Sun-Shiqi opened 3 months ago

Sun-Shiqi commented 3 months ago

When I run the demo ( step3_rlhf_finetuning/training_scripts/opt/single_node/run_1.3b.sh) without any change , the reward dose not increase. Is it normal? I would appreciate it if anyone can provide a normal reward curve .

Sun-Shiqi commented 3 months ago

下载

this is my reward curve