korlin0110 commented 1 year ago

In the step1, I can use the bloom model, like bloom-3b. But in the step2, I use bloom-560m for reward finetuning, the error message is:

Running training Evaluating reward, Epoch 0/1 Traceback (most recent call last): File "main.py", line 352, in main() File "main.py", line 306, in main reward_score, acc = evaluation_reward(rm_model, eval_dataloader) File "main.py", line 252, in evaluation_reward outputs = model(batch) File "/workspace/anaconda3/envs/deepspeed/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "/workspace/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(args, kwargs) File "/workspace/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1695, in forward loss = self.module(*inputs, *kwargs) File "/workspace/anaconda3/envs/deepspeed/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(input, **kwargs) File "/workspace/Project/DeepSpeedExamples/applications/DeepSpeed-Chat/training/utils/model/reward_model.py", line 95, in forward assert divergence_ind > 0 AssertionError

How can I fix the error, or step 2 won't work with bloom-560m?

MyHerbTea commented 1 year ago

Modify two places in the main.py file.

Step1: "--num_padding_at_beginning", set it to 0.

Step2: Find the code that loads the tokenizer and add "tokenizer.padding_side = 'right'" { tokenizer=load_hf_tokenizer (args.model_name_or_path, fast_tokenizer=True). tokenizer.padding_side = 'right' }

Hope it helps you! （^_^）

korlin0110 commented 1 year ago

Modify two places in the main.py file.

Step1: "--num_padding_at_beginning", set it to 0.

Step2: Find the code that loads the tokenizer and add "tokenizer.padding_side = 'right'" { tokenizer=load_hf_tokenizer (args.model_name_or_path, fast_tokenizer=True). tokenizer.padding_side = 'right' }

Hope it helps you! （^_^）

Thank's your help! I modified the two places in the main.py file of step2_reward_model_finetuning, like this:

parser.add_argument( "--num_padding_at_beginning", type=int, default=0, help= "OPT model has a fixed number (1) of padding tokens at the beginning of the input. " "We did not see this in other models but keep it as an option for now.", ) ..... tokenizer = load_hf_tokenizer(args.model_name_or_path, fast_tokenizer=True) tokenizer.padding_side = 'right'

tokenizer.pad_token = tokenizer.eos_token

But error persists, like this:

Running training Evaluating reward, Epoch 0/1 Traceback (most recent call last): File "main.py", line 353, in main() File "main.py", line 307, in main reward_score, acc = evaluation_reward(rm_model, eval_dataloader) File "main.py", line 253, in evaluation_reward outputs = model(batch) File "/workspace/anaconda3/envs/deepspeed/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "/workspace/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(args, kwargs) File "/workspace/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1695, in forward loss = self.module(*inputs, kwargs) File "/workspace/anaconda3/envs/deepspeed/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "/workspace/Project/DeepSpeedExamples_Bloom/applications/DeepSpeed-Chat/training/utils/model/reward_model.py", line 95, in forward assert divergence_ind > 0 AssertionError Traceback (most recent call last): File "main.py", line 353, in main() File "main.py", line 307, in main reward_score, acc = evaluation_reward(rm_model, eval_dataloader) File "main.py", line 253, in evaluation_reward outputs = model(batch) File "/workspace/anaconda3/envs/deepspeed/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "/workspace/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(*args, kwargs) File "/workspace/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1695, in forward loss = self.module(*inputs, *kwargs) File "/workspace/anaconda3/envs/deepspeed/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "/workspace/Project/DeepSpeedExamples_Bloom/applications/DeepSpeed-Chat/training/utils/model/reward_model.py", line 95, in forward assert divergence_ind > 0 AssertionError [2023-05-05 22:08:02,856] [INFO] [launch.py:428:sigkill_handler] Killing subprocess 37886 [2023-05-05 22:08:02,859] [INFO] [launch.py:428:sigkill_handler] Killing subprocess 37887 [2023-05-05 22:08:02,859] [ERROR] [launch.py:434:sigkill_handler] ['/workspace/anaconda3/envs/deepspeed/bin/python', '-u', 'main.py', '--local_rank=1', '--data_path', 'Dahoas/rm-static', 'Dahoas/full-hh-rlhf', 'Dahoas/synthetic-instruct-gptj-pairwise', 'yitingxie/rlhf-reward-datasets', '--data_split', '2,4,4', '--model_name_or_path', 'bigscience/bloom-560m', '--num_padding_at_beginning', '0', '--per_device_train_batch_size', '4', '--per_device_eval_batch_size', '4', '--max_seq_len', '512', '--learning_rate', '5e-5', '--weight_decay', '0.1', '--num_train_epochs', '1', '--disable_dropout', '--gradient_accumulation_steps', '1', '--lr_scheduler_type', 'cosine', '--num_warmup_steps', '0', '--seed', '1234', '--zero_stage', '0', '--deepspeed', '--output_dir', './output'] exits with return code = 1

Is there something I'm missing?

BaiStone2017 commented 1 year ago

same error. it works when change:assert divergence_ind >= 0

yaozhewei commented 1 year ago

Remove the divergence_ind should work. It is a bit weird to me that the divergence_ind is 0 which means the seq is different from the 1st token. The prompt part of both chosen and reject answers should begin with the same prompt (query). You may want to verify the dataset.

zhangzhenyu13 commented 1 year ago

same error. it works when change:assert divergence_ind >= 0

If you do this, terrible errors will occur. because prompt is mis-aligned for input ids computation. The reward model comput loss based on chosen: prompt + response(chosen) reject: prompt + response(recjected) divergence_ind should align to the response part, because bloom pad left, so it is not proper to start from a wrong index for loss computation.

I think we should change the padding style.

microsoft / DeepSpeedExamples

Can not use bloom-560m model in the step2_reward_model_finetuning #479

tokenizer.pad_token = tokenizer.eos_token