I was running the script from step3: python3 train.py --step 3 --deployment-type single_gpu
The training.log shows this:
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set padding_side='left' when initializing the tokenizer.
[2023-07-13 06:32:07,236] [INFO] [fused_optimizer.py:362:_update_scale]
Grad overflow on iteration 3
[2023-07-13 06:32:07,236] [INFO] [fused_optimizer.py:363:_update_scale] Reducing dynamic loss scale from 8192.0 to 4096.0
[2023-07-13 06:32:07,236] [INFO] [logging.py:96:log_dist] [Rank 0] Overflow detected. Skipping step. Attempted loss scale: 8192.0, reducing to 4096.0
epoch: 0|step: 7|ppo_ep: 1|act_loss: -0.9267578125|cri_loss: 2.818359375|unsuper_loss: 0.0
average reward score: -16.859375
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set padding_side='left' when initializing the tokenizer.
epoch: 0|step: 8|ppo_ep: 1|act_loss: -0.9267578125|cri_loss: 2.818359375|unsuper_loss: 0.0
average reward score: -16.859375
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set padding_side='left' when initializing the tokenizer.
epoch: 0|step: 9|ppo_ep: 1|act_loss: -0.9267578125|cri_loss: 2.818359375|unsuper_loss: 0.0
average reward score: -16.859375
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set padding_side='left' when initializing the tokenizer.
epoch: 0|step: 10|ppo_ep: 1|act_loss: -0.9267578125|cri_loss: 2.818359375|unsuper_loss: 0.0
average reward score: -16.859375
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set padding_side='left' when initializing the tokenizer.
epoch: 0|step: 11|ppo_ep: 1|act_loss: -0.9267578125|cri_loss: 2.818359375|unsuper_loss: 0.0
average reward score: -16.859375
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set padding_side='left' when initializing the tokenizer.
epoch: 0|step: 12|ppo_ep: 1|act_loss: -0.9267578125|cri_loss: 2.818359375|unsuper_loss: 0.0
average reward score: -16.859375
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set padding_side='left' when initializing the tokenizer.
epoch: 0|step: 13|ppo_ep: 1|act_loss: -0.9267578125|cri_loss: 2.818359375|unsuper_loss: 0.0
average reward score: -16.859375
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set padding_side='left' when initializing the tokenizer.
epoch: 0|step: 14|ppo_ep: 1|act_loss: -0.9267578125|cri_loss: 2.818359375|unsuper_loss: 0.0
average reward score: -16.859375
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set padding_side='left' when initializing the tokenizer.
epoch: 0|step: 15|ppo_ep: 1|act_loss: -0.9267578125|cri_loss: 2.818359375|unsuper_loss: 0.0
average reward score: -16.859375
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set padding_side='left' when initializing the tokenizer.
epoch: 0|step: 16|ppo_ep: 1|act_loss: -0.9267578125|cri_loss: 2.818359375|unsuper_loss: 0.0
average reward score: -16.859375
And I checked the main.py, in it writes:
create common tokenizer based on actor model
tokenizer = load_hf_tokenizer(args.actor_model_name_or_path,
fast_tokenizer=True)
tokenizer.pad_token = tokenizer.eos_token
# make sure tokenizer is right pad in our logic
tokenizer.padding_side = 'right'
I use the dataset Dahoas/rm-static in my case. So, I wonder why is this happening? What should I do under this situation?
Same question. I think current decoder-only models should use left padding especially when generating. I dont know why they use the right padding in the code.
I was running the script from step3: python3 train.py --step 3 --deployment-type single_gpu The training.log shows this:
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set
padding_side='left'
when initializing the tokenizer. [2023-07-13 06:32:07,236] [INFO] [fused_optimizer.py:362:_update_scale] Grad overflow on iteration 3 [2023-07-13 06:32:07,236] [INFO] [fused_optimizer.py:363:_update_scale] Reducing dynamic loss scale from 8192.0 to 4096.0 [2023-07-13 06:32:07,236] [INFO] [logging.py:96:log_dist] [Rank 0] Overflow detected. Skipping step. Attempted loss scale: 8192.0, reducing to 4096.0 epoch: 0|step: 7|ppo_ep: 1|act_loss: -0.9267578125|cri_loss: 2.818359375|unsuper_loss: 0.0 average reward score: -16.859375A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set
padding_side='left'
when initializing the tokenizer. epoch: 0|step: 8|ppo_ep: 1|act_loss: -0.9267578125|cri_loss: 2.818359375|unsuper_loss: 0.0 average reward score: -16.859375A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set
padding_side='left'
when initializing the tokenizer. epoch: 0|step: 9|ppo_ep: 1|act_loss: -0.9267578125|cri_loss: 2.818359375|unsuper_loss: 0.0 average reward score: -16.859375A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set
padding_side='left'
when initializing the tokenizer. epoch: 0|step: 10|ppo_ep: 1|act_loss: -0.9267578125|cri_loss: 2.818359375|unsuper_loss: 0.0 average reward score: -16.859375A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set
padding_side='left'
when initializing the tokenizer. epoch: 0|step: 11|ppo_ep: 1|act_loss: -0.9267578125|cri_loss: 2.818359375|unsuper_loss: 0.0 average reward score: -16.859375A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set
padding_side='left'
when initializing the tokenizer. epoch: 0|step: 12|ppo_ep: 1|act_loss: -0.9267578125|cri_loss: 2.818359375|unsuper_loss: 0.0 average reward score: -16.859375A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set
padding_side='left'
when initializing the tokenizer. epoch: 0|step: 13|ppo_ep: 1|act_loss: -0.9267578125|cri_loss: 2.818359375|unsuper_loss: 0.0 average reward score: -16.859375A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set
padding_side='left'
when initializing the tokenizer. epoch: 0|step: 14|ppo_ep: 1|act_loss: -0.9267578125|cri_loss: 2.818359375|unsuper_loss: 0.0 average reward score: -16.859375A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set
padding_side='left'
when initializing the tokenizer. epoch: 0|step: 15|ppo_ep: 1|act_loss: -0.9267578125|cri_loss: 2.818359375|unsuper_loss: 0.0 average reward score: -16.859375A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set
padding_side='left'
when initializing the tokenizer. epoch: 0|step: 16|ppo_ep: 1|act_loss: -0.9267578125|cri_loss: 2.818359375|unsuper_loss: 0.0 average reward score: -16.859375And I checked the main.py, in it writes:
create common tokenizer based on actor model
I use the dataset Dahoas/rm-static in my case. So, I wonder why is this happening? What should I do under this situation?