use default rm_static dataset,
set train_data to 75000
pretrain_model: bloomz-1b1
batch_size: 8
max_epochs: 4
max_len: 256
machine: 2 v100 32g
loss_fn: log_sig
after 3hours train
loss is random change
acc is low, and less than 0.6
Is there any wrong in reward model training stage?
🐛 Describe the bug
use default rm_static dataset, set train_data to 75000 pretrain_model: bloomz-1b1 batch_size: 8 max_epochs: 4 max_len: 256 machine: 2 v100 32g loss_fn: log_sig after 3hours train loss is random change acc is low, and less than 0.6 Is there any wrong in reward model training stage?
the detail:
Train step of epoch 4: 99%|█████████▉| 4740/4766 [1:02:51<00:15, 1.64it/s, dist=2.39, acc=0.555][A Train step of epoch 4: 99%|█████████▉| 4740/4766 [1:02:51<00:15, 1.64it/s, dist=2.39, acc=0.555][A Train step of epoch 4: 99%|█████████▉| 4741/4766 [1:02:51<00:15, 1.63it/s, dist=2.39, acc=0.555][A Train step of epoch 4: 99%|█████████▉| 4741/4766 [1:02:51<00:15, 1.63it/s, dist=2.39, acc=0.555][A Train step of epoch 4: 99%|█████████▉| 4742/4766 [1:02:52<00:14, 1.64it/s, dist=2.39, acc=0.555][A Train step of epoch 4: 99%|█████████▉| 4742/4766 [1:02:52<00:14, 1.64it/s, dist=2.39, acc=0.555][A Train step of epoch 4: 100%|█████████▉| 4743/4766 [1:02:52<00:14, 1.64it/s, dist=2.39, acc=0.555][A Train step of epoch 4: 100%|█████████▉| 4743/4766 [1:02:52<00:14, 1.64it/s, dist=2.39, acc=0.555][A Train step of epoch 4: 100%|█████████▉| 4744/4766 [1:02:53<00:13, 1.63it/s, dist=2.39, acc=0.555][A Train step of epoch 4: 100%|█████████▉| 4744/4766 [1:02:53<00:13, 1.63it/s, dist=2.39, acc=0.555][A Train step of epoch 4: 100%|█████████▉| 4745/4766 [1:02:54<00:12, 1.64it/s, dist=2.39, acc=0.555][A Train step of epoch 4: 100%|█████████▉| 4745/4766 [1:02:54<00:12, 1.64it/s, dist=2.39, acc=0.555][A Train step of epoch 4: 100%|█████████▉| 4746/4766 [1:02:54<00:12, 1.63it/s, dist=2.39, acc=0.555][A
Environment
No response