hpcaitech / ColossalAI

Making large AI models cheaper, faster and more accessible
https://www.colossalai.org
Apache License 2.0
38.85k stars 4.35k forks source link

[BUG]:train_reward_model acc is low #3534

Closed guoweigang closed 1 year ago

guoweigang commented 1 year ago

🐛 Describe the bug

use default rm_static dataset, set train_data to 75000 pretrain_model: bloomz-1b1 batch_size: 8 max_epochs: 4 max_len: 256 machine: 2 v100 32g loss_fn: log_sig after 3hours train loss is random change acc is low, and less than 0.6 Is there any wrong in reward model training stage?

the detail:

Train step of epoch 4: 99%|█████████▉| 4740/4766 [1:02:51<00:15, 1.64it/s, dist=2.39, acc=0.555] Train step of epoch 4: 99%|█████████▉| 4740/4766 [1:02:51<00:15, 1.64it/s, dist=2.39, acc=0.555] Train step of epoch 4: 99%|█████████▉| 4741/4766 [1:02:51<00:15, 1.63it/s, dist=2.39, acc=0.555] Train step of epoch 4: 99%|█████████▉| 4741/4766 [1:02:51<00:15, 1.63it/s, dist=2.39, acc=0.555] Train step of epoch 4: 99%|█████████▉| 4742/4766 [1:02:52<00:14, 1.64it/s, dist=2.39, acc=0.555] Train step of epoch 4: 99%|█████████▉| 4742/4766 [1:02:52<00:14, 1.64it/s, dist=2.39, acc=0.555] Train step of epoch 4: 100%|█████████▉| 4743/4766 [1:02:52<00:14, 1.64it/s, dist=2.39, acc=0.555] Train step of epoch 4: 100%|█████████▉| 4743/4766 [1:02:52<00:14, 1.64it/s, dist=2.39, acc=0.555] Train step of epoch 4: 100%|█████████▉| 4744/4766 [1:02:53<00:13, 1.63it/s, dist=2.39, acc=0.555] Train step of epoch 4: 100%|█████████▉| 4744/4766 [1:02:53<00:13, 1.63it/s, dist=2.39, acc=0.555] Train step of epoch 4: 100%|█████████▉| 4745/4766 [1:02:54<00:12, 1.64it/s, dist=2.39, acc=0.555] Train step of epoch 4: 100%|█████████▉| 4745/4766 [1:02:54<00:12, 1.64it/s, dist=2.39, acc=0.555] Train step of epoch 4: 100%|█████████▉| 4746/4766 [1:02:54<00:12, 1.63it/s, dist=2.39, acc=0.555]

Environment

No response

binmakeswell commented 1 year ago

Hi @guoweigang Maybe it's an inappropriate sh command. We have fixed it. Thanks. #3490 https://github.com/hpcaitech/ColossalAI/blob/main/applications/Chat/examples/train_rm.sh