Open swang99 opened 1 year ago
Hi @swang99, thank you for reaching out! What model are you currently using?
Hi @swang99, thanks for reaching out. When doing RLHF the same sequence gets propagated to all the models, so I would recommend:
Thanks for the recommendations. The error still persists unfortunately. Can I simply increase the additional_prompt_tokens or would I need to save a new actor model? Below is my config.yaml
trainer_config:
# learning rates
actor_lr: 0.000005
critic_lr: 0.000009
# PPO Hyperparameters
actor_eps_clip: 0.2
critic_eps_clip: 0.2
beta_s: 0.02
# coefficient for the discounted rewards
gamma_discounted: 1
# path to examples to be sampled (training dataset) see rlhf_dataset.json
examples_path: "./datasets/rlhf_training_data.json"
# number of episodes and generation performed for each episode
# in the train() method
num_episodes: 100
max_timesteps: 32
# number of timesteps after which the learn() method is called
# (to update the weights)
update_timesteps: 32
# number of example sampled at each timestep
num_examples: 1
# batch and epochs for the training
batch_size: 1
epochs: 1
# number of episodes after which update the checkpoints in RL training
checkpoint_steps: 10
# here specify the name of the actor_rl checkpoint from which resume
# during actor RL training. If null load the last one.
checkpoint_name: null
actor_config:
model: "facebook/opt-1.3b"
model_folder: "./models"
tokenizer_path: "path-to-tokenizer"
train_dataset_path: "./datasets/actor_training_data.json"
validation_dataset_path: null
# froze model embedding during training
froze_embeddings: True
# use fairscale layers to build the model instead of vanilla pytorch
# only for llama
use_fairscale: False
# max sequence length for the actor (i.e. prompt + completion) it depends on
# the model used.
max_sequence_length: 1024
# max tokens generated by the actor (completion only)
max_tokens: 2048
# minimum number of tokens generated by the actor
min_tokens: 100
# additional prompt tokens to be used for template or as safety
additonal_prompt_tokens: 100
# temperature for the actor
temperature: 0.1
batch_size: 1
# number iteration after print
iteration_per_print: 10
lr: 0.000009
epochs: 1
# number of backpropagation after saving the checkpoints
checkpoint_steps: 3000
# number of checkpoints to keep while removing the older
# (keep memory consumption of checkpoints reasonable)
n_checkpoints_to_keep: 2
# here specify the name of the actor checkpoint from which resume
# during actor training. If null load the last one.
checkpoint_name: null
# deepspeed settings
deepspeed_enable: False
deepspeed_config_path: "./artifacts/config/ds_config.json"
# accelerate settings
accelerate_enable: False
reward_config:
# model to be chosen are gp2-large, bart-base, longformer-base-4096
# more can be simply added in the reward.py __init__()
model: "facebook/opt-125m"
model_folder: "./models"
# hidden size of the additional ffw head to produce the scores
model_head_hidden_size: 2048
max_sequence_length: 1024
train_dataset_path: "./datasets/reward_training_data.json"
validation_dataset_path: null
batch_size: 8
epochs: 32
iteration_per_print: 1
# steps after which the checkpoint are saved
checkpoint_steps: 10000
# here specify the name of the reward checkpoint from which resume
# during reward training. If null load the last one.
checkpoint_name: null
lr: 0.000009
# deepspeed settings
deepspeed_enable: False
deepspeed_config_path: "./artifacts/config/ds_config.json"
# accelerate settings
accelerate_enable: False
critic_config:
# model to be chosen are gp2-large, bart-base, longformer-base-4096
# more can be simply added in the reward.py __init__()
model: "facebook/opt-125m"
# hidden size of the additional ffw head to produce the scores
model_head_hidden_size: 2048
max_sequence_length: 1024
model_folder: "./models"
# here specify the name of the critic checkpoint from which resume
# during critic training. If null load the last one.
checkpoint_name: null
Hi @swang99
I will test it more details in the following days and let you now!
Hi @swang99 . I have found the problem and should have been fixed in the PR #306 Let me know if you still have the same issue!
Hi @PierpaoloSorbellini thank you for rolling out the fixes. This might not be very specific, but although I was able to get further into training, around the 9th timestep the training stopped suddently due to a loss is NaN error. Has this been addressed in the past?
Hi @PierpaoloSorbellini thank you for rolling out the fixes. This might not be very specific, but although I was able to get further into training, around the 9th timestep the training stopped suddently due to a loss is NaN error. Has this been addressed in the past?
I have the same problem. Did you fixed it?
Hi @Mialiu91 @swang99, Yes problem should be fixed in #306 soon to be merged. Now before starting the training a method for checking the dataset is implemented. Inside this method the None elements are removed from the dataset to avoid this error.
if isinstance(config, ConfigReward):
cnt = 0
while cnt < len(conversations):
if conversations[cnt]["score"] is None:
conversations.pop(cnt)
cnt = cnt + 1
I am getting the following error when doing RLHF training. I decreased the max_sequence_length in my actor configuration to 1024 because there were errors with training for me when set to 2048. Is my actor max_sequence_length too small, and does this mean I have to redo pre-training with a larger max sequence? There isn't a way to change the state_length to my knowledge.
ValueError: The prompt is too long w.r.t the model sequence length max_sequence_length=1024 state_length=1024 min_tokens=100 max_tokens=2048 max_generation_possible=0