Open Luoyang144 opened 1 year ago
I also meet this problem, does it solved?
This problem is due to 560m & 7b1 bloomz model use left-padding by default, which is really weird :( You can change the padding style to right-padding to avoid this problem. BTW change ">" to ">=" will not affect the program. However, this program is designed for right-padding, so left-padding will lead to total wrongness.
@LuciusMos Thank you! By the way, for others using BLOOM, I advice add 1e-7 to difference of two sentences' reward, it will help you avoid inf
loss in training process.
Reward model trainning success, but using rw_eval.py to eval reward model by this command python rw_eval.py --model_name_or_path reward_model/bloom-560m --num_padding_at_beginning 0
has this error:
OSError: Can't load tokenizer for 'reward_model/bloom-560m'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'reward_model/bloom-560m' is the correct path to a directory containing all relevant files for a BloomTokenizerFast tokenizer.
All files in reward_model/bloom-560m :
├── config.json
├── merges.txt
├── pytorch_model.bin
├── training.log
└── vocab.json
However, if choose opt model in step2, rw_eval.py works fine.
@cokuehuang Maybe you should upgrade transformers
version
My transformers version is 4.29.0.dev0.
Maybe transformers/src/transformers/models/bloom/tokenization_bloom_fast.py
needs VOCAB_FILES_NAMES = {"tokenizer_file": "tokenizer.json"}
, but result of bloom trainnging in step2 has no this file. Howerver opt's VOCAB_FILES_NAMES = {"vocab_file": "vocab.json", "merges_file": "merges.txt", "tokenizer_file": "tokenizer.json"}
.
This problem is due to 560m & 7b1 bloomz model use left-padding by default, which is really weird :( You can change the padding style to right-padding to avoid this problem. BTW change ">" to ">=" will not affect the program. However, this program is designed for right-padding, so left-padding will lead to total wrongness.
how ?
This problem is due to 560m & 7b1 bloomz model use left-padding by default, which is really weird :( You can change the padding style to right-padding to avoid this problem. BTW change ">" to ">=" will not affect the program. However, this program is designed for right-padding, so left-padding will lead to total wrongness.
how ?
@lc222 Just add padding_side="right"
kwarg in the tokenizer init function.
For example: tokenizer = load_hf_tokenizer(args.model_name_or_path, fast_tokenizer=True, padding_side="right")
@LuciusMos Thank you! By the way, for others using BLOOM, I advice add 1e-7 to difference of two sentences' reward, it will help you avoid
inf
loss in training process.
I set the padding side to right
and clamped the loss to avoid inf
. The training can run without error, but it gives "Grad overflow" at every iteration. How did you solve that?
@LiinXemmon Hi, this is caused by log(0) which will return inf
, I think you should a very small value to difference of two sentences' reward(like 1e-7), it will help you avoid inf
loss in training process.
@LiinXemmon Hi, this is caused by log(0) which will return
inf
, I think you should a very small value to difference of two sentences' reward(like 1e-7), it will help you avoidinf
loss in training process.
Hi Luoyang, I have added 1e-7
to the reward_model.py
file under the utils/model
folder while it still faces the inf
loss issue. When using zero_stage = 3
, the loss scale will drop to the minimum (1 here) and raise the error immediately after starting training. Changing the zero_stage = 0
will also constantly show the Grad Overflow problem though it can be trained.
@LiinXemmon Hi, this is caused by log(0) which will return
inf
, I think you should a very small value to difference of two sentences' reward(like 1e-7), it will help you avoidinf
loss in training process.Hi Luoyang, I have added
1e-7
to thereward_model.py
file under theutils/model
folder while it still faces theinf
loss issue. When usingzero_stage = 3
, the loss scale will drop to the minimum (1 here) and raise the error immediately after starting training. Changing thezero_stage = 0
will also constantly show the Grad Overflow problem though it can be trained.
I solved "Grad overflow" by using bf16
rather than the default fp16
. Adding 1e-7
to the reward_model.py
file works for me to avoid inf
loss. I modified the line as
loss += -torch.log( torch.sigmoid(c_truncated_reward - r_truncated_reward)+1e-7).mean()
@LiinXemmon Hi, this is caused by log(0) which will return
inf
, I think you should a very small value to difference of two sentences' reward(like 1e-7), it will help you avoidinf
loss in training process.Hi Luoyang, I have added
1e-7
to thereward_model.py
file under theutils/model
folder while it still faces theinf
loss issue. When usingzero_stage = 3
, the loss scale will drop to the minimum (1 here) and raise the error immediately after starting training. Changing thezero_stage = 0
will also constantly show the Grad Overflow problem though it can be trained.I solved "Grad overflow" by using
bf16
rather than the defaultfp16
. Adding1e-7
to thereward_model.py
file works for me to avoidinf
loss. I modified the line asloss += -torch.log( torch.sigmoid(c_truncated_reward - r_truncated_reward)+1e-7).mean()
how to use bf16 rather than fp16?
@LiinXemmon Hi, this is caused by log(0) which will return
inf
, I think you should a very small value to difference of two sentences' reward(like 1e-7), it will help you avoidinf
loss in training process.Hi Luoyang, I have added
1e-7
to thereward_model.py
file under theutils/model
folder while it still faces theinf
loss issue. When usingzero_stage = 3
, the loss scale will drop to the minimum (1 here) and raise the error immediately after starting training. Changing thezero_stage = 0
will also constantly show the Grad Overflow problem though it can be trained.I solved "Grad overflow" by using
bf16
rather than the defaultfp16
. Adding1e-7
to thereward_model.py
file works for me to avoidinf
loss. I modified the line asloss += -torch.log( torch.sigmoid(c_truncated_reward - r_truncated_reward)+1e-7).mean()
how to use bf16 rather than fp16?
I changed fp16 to bf16 in this file: DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/utils/ds_utils.py. Like this:
return {
"train_batch_size": GLOBAL_BATCH_SIZE,
"train_micro_batch_size_per_gpu": MICRO_BATCH_SIZE,
"steps_per_print": 10,
"zero_optimization": zero_opt_dict,
"bf16": { # changed from fp16 to bf16
"enabled": True,
"loss_scale_window": 100
},
"gradient_clipping": 1.0,
"prescale_gradients": False,
"wall_clock_breakdown": False,
"hybrid_engine": {
"enabled": enable_hybrid_engine,
"max_out_tokens": max_out_tokens,
"inference_tp_size": inference_tp_size,
"release_inference_cache": release_inference_cache,
"pin_parameters": pin_parameters,
"tp_gather_partition_size": tp_gather_partition_size,
}
However, I am not sure whether this is THE right way to do it.
@LiinXemmon Hi, this is caused by log(0) which will return
inf
, I think you should a very small value to difference of two sentences' reward(like 1e-7), it will help you avoidinf
loss in training process.Hi Luoyang, I have added
1e-7
to thereward_model.py
file under theutils/model
folder while it still faces theinf
loss issue. When usingzero_stage = 3
, the loss scale will drop to the minimum (1 here) and raise the error immediately after starting training. Changing thezero_stage = 0
will also constantly show the Grad Overflow problem though it can be trained.I solved "Grad overflow" by using
bf16
rather than the defaultfp16
. Adding1e-7
to thereward_model.py
file works for me to avoidinf
loss. I modified the line asloss += -torch.log( torch.sigmoid(c_truncated_reward - r_truncated_reward)+1e-7).mean()
great job
Hello, I‘m tring to use BLOOMZ for reward model training, and get error:
After output
divergence_ind
I find it is 0 and changeassert divergence_ind > 0
toassert divergence_ind >= 0
, will this affect the program?