[ChatLLaMA] RLHF Training: Prompt too long

swang99 commented 1 year ago

I am getting the following error when doing RLHF training. I decreased the max_sequence_length in my actor configuration to 1024 because there were errors with training for me when set to 2048. Is my actor max_sequence_length too small, and does this mean I have to redo pre-training with a larger max sequence? There isn't a way to change the state_length to my knowledge.

ValueError: The prompt is too long w.r.t the model sequence length max_sequence_length=1024 state_length=1024 min_tokens=100 max_tokens=2048 max_generation_possible=0

diegofiori commented 1 year ago

Hi @swang99, thank you for reaching out! What model are you currently using?

PierpaoloSorbellini commented 1 year ago

Hi @swang99, thanks for reaching out. When doing RLHF the same sequence gets propagated to all the models, so I would recommend:

first to check that all the models in your config share the same max_seq_length in the config.yaml to avoid problems.
Second you can try to increase the additonal_prompt_tokens a bit if the problem still persist. If you still having the problem I would be helpful to report the config.yaml so that we can investigate further.

swang99 commented 1 year ago

Thanks for the recommendations. The error still persists unfortunately. Can I simply increase the additional_prompt_tokens or would I need to save a new actor model? Below is my config.yaml

trainer_config:
  # learning rates
  actor_lr: 0.000005
  critic_lr: 0.000009
  # PPO Hyperparameters
  actor_eps_clip: 0.2
  critic_eps_clip: 0.2
  beta_s: 0.02
  # coefficient for the discounted rewards
  gamma_discounted: 1 
  # path to examples to be sampled (training dataset) see rlhf_dataset.json
  examples_path: "./datasets/rlhf_training_data.json"
  # number of episodes and generation performed for each episode
  # in the train() method
  num_episodes: 100
  max_timesteps: 32
  # number of timesteps after which the learn() method is called 
  # (to update the weights)
  update_timesteps: 32
  # number of example sampled at each timestep
  num_examples: 1
  # batch and epochs for the training
  batch_size: 1
  epochs: 1
  # number of episodes after which update the checkpoints in RL training
  checkpoint_steps: 10
  # here specify the name of the actor_rl checkpoint from which resume 
  # during actor RL training. If null load the last one.
  checkpoint_name: null

actor_config:
  model: "facebook/opt-1.3b"
  model_folder: "./models"
  tokenizer_path: "path-to-tokenizer"
  train_dataset_path: "./datasets/actor_training_data.json"
  validation_dataset_path: null
  # froze model embedding during training
  froze_embeddings: True
  # use fairscale layers to build the model instead of vanilla pytorch
  # only for llama
  use_fairscale: False
  # max sequence length for the actor (i.e. prompt + completion) it depends on
  # the model used.
  max_sequence_length: 1024
  # max tokens generated by the actor (completion only)
  max_tokens: 2048
  # minimum number of tokens generated by the actor
  min_tokens: 100
  # additional prompt tokens to be used for template or as safety
  additonal_prompt_tokens: 100
  # temperature for the actor
  temperature: 0.1
  batch_size: 1
  # number iteration after print
  iteration_per_print: 10
  lr: 0.000009
  epochs: 1
  # number of backpropagation after saving the checkpoints
  checkpoint_steps: 3000
  # number of checkpoints to keep while removing the older 
  # (keep memory consumption of checkpoints reasonable)
  n_checkpoints_to_keep: 2
  # here specify the name of the actor checkpoint from which resume 
  # during actor training. If null load the last one.
  checkpoint_name: null
  # deepspeed settings
  deepspeed_enable: False
  deepspeed_config_path: "./artifacts/config/ds_config.json"
  # accelerate settings
  accelerate_enable: False

reward_config:
  # model to be chosen are gp2-large, bart-base, longformer-base-4096
  # more can be simply added in the reward.py __init__()
  model: "facebook/opt-125m"
  model_folder: "./models"
  # hidden size of the additional ffw head to produce the scores
  model_head_hidden_size: 2048
  max_sequence_length: 1024
  train_dataset_path: "./datasets/reward_training_data.json"
  validation_dataset_path: null
  batch_size: 8
  epochs: 32
  iteration_per_print: 1
  # steps after which the checkpoint are saved
  checkpoint_steps: 10000
  # here specify the name of the reward checkpoint from which resume 
  # during reward training. If null load the last one.
  checkpoint_name: null
  lr: 0.000009
  # deepspeed settings
  deepspeed_enable: False
  deepspeed_config_path: "./artifacts/config/ds_config.json"
  # accelerate settings
  accelerate_enable: False

critic_config:
  # model to be chosen are gp2-large, bart-base, longformer-base-4096
  # more can be simply added in the reward.py __init__()
  model: "facebook/opt-125m"
  # hidden size of the additional ffw head to produce the scores
  model_head_hidden_size: 2048
  max_sequence_length: 1024
  model_folder: "./models"
  # here specify the name of the critic checkpoint from which resume 
  # during critic training. If null load the last one.
  checkpoint_name: null

PierpaoloSorbellini commented 1 year ago

Hi @swang99
I will test it more details in the following days and let you now!

PierpaoloSorbellini commented 1 year ago

Hi @swang99 . I have found the problem and should have been fixed in the PR #306 Let me know if you still have the same issue!

swang99 commented 1 year ago

Hi @PierpaoloSorbellini thank you for rolling out the fixes. This might not be very specific, but although I was able to get further into training, around the 9th timestep the training stopped suddently due to a loss is NaN error. Has this been addressed in the past?

Mialiu91 commented 1 year ago

Hi @PierpaoloSorbellini thank you for rolling out the fixes. This might not be very specific, but although I was able to get further into training, around the 9th timestep the training stopped suddently due to a loss is NaN error. Has this been addressed in the past?

I have the same problem. Did you fixed it?

PierpaoloSorbellini commented 1 year ago

Hi @Mialiu91 @swang99, Yes problem should be fixed in #306 soon to be merged. Now before starting the training a method for checking the dataset is implemented. Inside this method the None elements are removed from the dataset to avoid this error.

if isinstance(config, ConfigReward):
  cnt = 0
  while cnt < len(conversations):
      if conversations[cnt]["score"] is None:
          conversations.pop(cnt)
      cnt = cnt + 1

nebuly-ai / optimate

[ChatLLaMA] RLHF Training: Prompt too long #299