Open korlin0110 opened 1 year ago
Modify two places in the main.py file.
Step1: "--num_padding_at_beginning", set it to 0.
Step2: Find the code that loads the tokenizer and add "tokenizer.padding_side = 'right'" { tokenizer=load_hf_tokenizer (args.model_name_or_path, fast_tokenizer=True). tokenizer.padding_side = 'right' }
Hope it helps you! (^_^)
Modify two places in the main.py file.
Step1: "--num_padding_at_beginning", set it to 0.
Step2: Find the code that loads the tokenizer and add "tokenizer.padding_side = 'right'" { tokenizer=load_hf_tokenizer (args.model_name_or_path, fast_tokenizer=True). tokenizer.padding_side = 'right' }
Hope it helps you! (^_^)
Thank's your help! I modified the two places in the main.py file of step2_reward_model_finetuning, like this:
parser.add_argument( "--num_padding_at_beginning", type=int, default=0, help= "OPT model has a fixed number (1) of padding tokens at the beginning of the input. " "We did not see this in other models but keep it as an option for now.", ) ..... tokenizer = load_hf_tokenizer(args.model_name_or_path, fast_tokenizer=True) tokenizer.padding_side = 'right'
But error persists, like this:
Running training
Evaluating reward, Epoch 0/1
Traceback (most recent call last):
File "main.py", line 353, in
Is there something I'm missing?
same error.
it works when change:assert divergence_ind >= 0
Remove the divergence_ind should work. It is a bit weird to me that the divergence_ind is 0 which means the seq is different from the 1st token. The prompt part of both chosen and reject answers should begin with the same prompt (query). You may want to verify the dataset.
same error. it works when change:
assert divergence_ind >= 0
If you do this, terrible errors will occur. because prompt is mis-aligned for input ids computation. The reward model comput loss based on chosen: prompt + response(chosen) reject: prompt + response(recjected) divergence_ind should align to the response part, because bloom pad left, so it is not proper to start from a wrong index for loss computation.
I think we should change the padding style.
In the step1, I can use the bloom model, like bloom-3b. But in the step2, I use bloom-560m for reward finetuning, the error message is:
Running training Evaluating reward, Epoch 0/1 Traceback (most recent call last): File "main.py", line 352, in
main()
File "main.py", line 306, in main
reward_score, acc = evaluation_reward(rm_model, eval_dataloader)
File "main.py", line 252, in evaluation_reward
outputs = model(batch)
File "/workspace/anaconda3/envs/deepspeed/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, *kwargs)
File "/workspace/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(args, kwargs)
File "/workspace/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1695, in forward
loss = self.module(*inputs, *kwargs)
File "/workspace/anaconda3/envs/deepspeed/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(input, **kwargs)
File "/workspace/Project/DeepSpeedExamples/applications/DeepSpeed-Chat/training/utils/model/reward_model.py", line 95, in forward
assert divergence_ind > 0
AssertionError
How can I fix the error, or step 2 won't work with bloom-560m?