Error when using BLOOMZ for reward model training

Luoyang144 commented 1 year ago

Hello, I‘m tring to use BLOOMZ for reward model training, and get error:

Traceback (most recent call last):
  File "/users5/xydu/ChatGPT/DeepSpeed-Chat/training/step2_reward_model_finetuning/training_scripts/single_node/../../main.py", line 349, in <module>
    main()
  File "/users5/xydu/ChatGPT/DeepSpeed-Chat/training/step2_reward_model_finetuning/training_scripts/single_node/../../main.py", line 303, in main
    reward_score, acc = evaluation_reward(rm_model, eval_dataloader)
  File "/users5/xydu/ChatGPT/DeepSpeed-Chat/training/step2_reward_model_finetuning/training_scripts/single_node/../../main.py", line 249, in evaluation_reward
    outputs = model(**batch)
  File "/users5/xydu/anaconda3/envs/dpchat/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/users5/xydu/ChatGPT/DeepSpeed-Chat/training/utils/model/reward_model.py", line 97, in forward
    return forward_call(*args, **kwargs)
  File "/users5/xydu/anaconda3/envs/dpchat/lib/python3.9/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/users5/xydu/anaconda3/envs/dpchat/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1695, in forward
    loss = self.module(*inputs, **kwargs)
  File "/users5/xydu/anaconda3/envs/dpchat/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    assert divergence_ind > 0, divergence_ind
AssertionError    return forward_call(*args, **kwargs)
  File "/users5/xydu/ChatGPT/DeepSpeed-Chat/training/utils/model/reward_model.py", line 97, in forward
    assert divergence_ind > 0

After output divergence_ind I find it is 0 and change assert divergence_ind > 0 to assert divergence_ind >= 0, will this affect the program?

cokuehuang commented 1 year ago

I also meet this problem, does it solved?

LuciusMos commented 1 year ago

This problem is due to 560m & 7b1 bloomz model use left-padding by default, which is really weird :( You can change the padding style to right-padding to avoid this problem. BTW change ">" to ">=" will not affect the program. However, this program is designed for right-padding, so left-padding will lead to total wrongness.

Luoyang144 commented 1 year ago

@LuciusMos Thank you! By the way, for others using BLOOM, I advice add 1e-7 to difference of two sentences' reward, it will help you avoid inf loss in training process.

cokuehuang commented 1 year ago

Reward model trainning success, but using rw_eval.py to eval reward model by this command python rw_eval.py --model_name_or_path reward_model/bloom-560m --num_padding_at_beginning 0 has this error: OSError: Can't load tokenizer for 'reward_model/bloom-560m'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'reward_model/bloom-560m' is the correct path to a directory containing all relevant files for a BloomTokenizerFast tokenizer. All files in reward_model/bloom-560m : ├── config.json ├── merges.txt ├── pytorch_model.bin ├── training.log └── vocab.json However, if choose opt model in step2, rw_eval.py works fine.

Luoyang144 commented 1 year ago

@cokuehuang Maybe you should upgrade transformers version

cokuehuang commented 1 year ago

My transformers version is 4.29.0.dev0.

cokuehuang commented 1 year ago

Maybe transformers/src/transformers/models/bloom/tokenization_bloom_fast.py needs VOCAB_FILES_NAMES = {"tokenizer_file": "tokenizer.json"} , but result of bloom trainnging in step2 has no this file. Howerver opt's VOCAB_FILES_NAMES = {"vocab_file": "vocab.json", "merges_file": "merges.txt", "tokenizer_file": "tokenizer.json"}.

lc222 commented 1 year ago

This problem is due to 560m & 7b1 bloomz model use left-padding by default, which is really weird :( You can change the padding style to right-padding to avoid this problem. BTW change ">" to ">=" will not affect the program. However, this program is designed for right-padding, so left-padding will lead to total wrongness.

how ?

LuciusMos commented 1 year ago

This problem is due to 560m & 7b1 bloomz model use left-padding by default, which is really weird :( You can change the padding style to right-padding to avoid this problem. BTW change ">" to ">=" will not affect the program. However, this program is designed for right-padding, so left-padding will lead to total wrongness.

how ?

@lc222 Just add padding_side="right" kwarg in the tokenizer init function. For example: tokenizer = load_hf_tokenizer(args.model_name_or_path, fast_tokenizer=True, padding_side="right")

LiinXemmon commented 1 year ago

@LuciusMos Thank you! By the way, for others using BLOOM, I advice add 1e-7 to difference of two sentences' reward, it will help you avoid inf loss in training process.

I set the padding side to right and clamped the loss to avoid inf. The training can run without error, but it gives "Grad overflow" at every iteration. How did you solve that?

Luoyang144 commented 1 year ago

@LiinXemmon Hi, this is caused by log(0) which will return inf, I think you should a very small value to difference of two sentences' reward(like 1e-7), it will help you avoid inf loss in training process.

lukaswangbk commented 1 year ago

@LiinXemmon Hi, this is caused by log(0) which will return inf, I think you should a very small value to difference of two sentences' reward(like 1e-7), it will help you avoid inf loss in training process.

Hi Luoyang, I have added 1e-7 to the reward_model.py file under the utils/model folder while it still faces the inf loss issue. When using zero_stage = 3, the loss scale will drop to the minimum (1 here) and raise the error immediately after starting training. Changing the zero_stage = 0 will also constantly show the Grad Overflow problem though it can be trained.

LiinXemmon commented 1 year ago

@LiinXemmon Hi, this is caused by log(0) which will return inf, I think you should a very small value to difference of two sentences' reward(like 1e-7), it will help you avoid inf loss in training process.

Hi Luoyang, I have added 1e-7 to the reward_model.py file under the utils/model folder while it still faces the inf loss issue. When using zero_stage = 3, the loss scale will drop to the minimum (1 here) and raise the error immediately after starting training. Changing the zero_stage = 0 will also constantly show the Grad Overflow problem though it can be trained.

I solved "Grad overflow" by using bf16 rather than the default fp16. Adding 1e-7 to the reward_model.py file works for me to avoid inf loss. I modified the line as loss += -torch.log( torch.sigmoid(c_truncated_reward - r_truncated_reward)+1e-7).mean()

zhan0903 commented 1 year ago

@LiinXemmon Hi, this is caused by log(0) which will return inf, I think you should a very small value to difference of two sentences' reward(like 1e-7), it will help you avoid inf loss in training process.

Hi Luoyang, I have added 1e-7 to the reward_model.py file under the utils/model folder while it still faces the inf loss issue. When using zero_stage = 3, the loss scale will drop to the minimum (1 here) and raise the error immediately after starting training. Changing the zero_stage = 0 will also constantly show the Grad Overflow problem though it can be trained.

I solved "Grad overflow" by using bf16 rather than the default fp16. Adding 1e-7 to the reward_model.py file works for me to avoid inf loss. I modified the line as loss += -torch.log( torch.sigmoid(c_truncated_reward - r_truncated_reward)+1e-7).mean()

how to use bf16 rather than fp16?

LiGhtime commented 1 year ago

@LiinXemmon Hi, this is caused by log(0) which will return inf, I think you should a very small value to difference of two sentences' reward(like 1e-7), it will help you avoid inf loss in training process.

Hi Luoyang, I have added 1e-7 to the reward_model.py file under the utils/model folder while it still faces the inf loss issue. When using zero_stage = 3, the loss scale will drop to the minimum (1 here) and raise the error immediately after starting training. Changing the zero_stage = 0 will also constantly show the Grad Overflow problem though it can be trained.

I solved "Grad overflow" by using bf16 rather than the default fp16. Adding 1e-7 to the reward_model.py file works for me to avoid inf loss. I modified the line as loss += -torch.log( torch.sigmoid(c_truncated_reward - r_truncated_reward)+1e-7).mean()

how to use bf16 rather than fp16?

I changed fp16 to bf16 in this file: DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/utils/ds_utils.py. Like this:

return {
        "train_batch_size": GLOBAL_BATCH_SIZE,
        "train_micro_batch_size_per_gpu": MICRO_BATCH_SIZE,
        "steps_per_print": 10,
        "zero_optimization": zero_opt_dict,
        "bf16": { # changed from fp16 to bf16
            "enabled": True,
            "loss_scale_window": 100
        },
        "gradient_clipping": 1.0,
        "prescale_gradients": False,
        "wall_clock_breakdown": False,
        "hybrid_engine": {
            "enabled": enable_hybrid_engine,
            "max_out_tokens": max_out_tokens,
            "inference_tp_size": inference_tp_size,
            "release_inference_cache": release_inference_cache,
            "pin_parameters": pin_parameters,
            "tp_gather_partition_size": tp_gather_partition_size,
        }

However, I am not sure whether this is THE right way to do it.

scarydemon2 commented 1 year ago

@LiinXemmon Hi, this is caused by log(0) which will return inf, I think you should a very small value to difference of two sentences' reward(like 1e-7), it will help you avoid inf loss in training process.

Hi Luoyang, I have added 1e-7 to the reward_model.py file under the utils/model folder while it still faces the inf loss issue. When using zero_stage = 3, the loss scale will drop to the minimum (1 here) and raise the error immediately after starting training. Changing the zero_stage = 0 will also constantly show the Grad Overflow problem though it can be trained.

I solved "Grad overflow" by using bf16 rather than the default fp16. Adding 1e-7 to the reward_model.py file works for me to avoid inf loss. I modified the line as loss += -torch.log( torch.sigmoid(c_truncated_reward - r_truncated_reward)+1e-7).mean()

great job

microsoft / DeepSpeedExamples

Error when using BLOOMZ for reward model training #338