Open zyzhang1130 opened 1 month ago
Hi @zyzhang1130
We do have tests that runs saving tests on AutoModelWithValueHeadxxx
: https://github.com/huggingface/trl/blob/13454d2f4b243b7260fa4ec828297812c3f975fc/tests/test_modeling_value_head.py#L102 but here it seems you are using PeftModel interface. Can you elaborate a bit on how you're training the model using PPO and saving it?
Hi @zyzhang1130 We do have tests that runs saving tests on
AutoModelWithValueHeadxxx
:https://github.com/huggingface/trl/blob/13454d2f4b243b7260fa4ec828297812c3f975fc/tests/test_modeling_value_head.py#L102 but here it seems you are using PeftModel interface. Can you elaborate a bit on how you're training the model using PPO and saving it?
loading:
model = AutoModelForCausalLMWithValueHead.from_pretrained(
load_model_path, local_files_only=True,
peft_config=lora_config,
device_map="auto",
)
ref_model = AutoModelForCausalLMWithValueHead.from_pretrained(
load_model_path, local_files_only=True,
load_in_8bit=True,
)
ppo:
ppo_config = {
'batch_size': negative_sample_size, # Keep as is
'learning_rate': 5e-7, # Example value
'mini_batch_size': 4, # Adjusted
'gradient_accumulation_steps': 1, # Adjusted
'optimize_cuda_cache': True
# Add any other configurations as needed
}
config = PPOConfig(**ppo_config)
ppo_trainer = PPOTrainer(config=config, tokenizer=tokenizer, model=model, ref_model = ref_model )
stats = ppo_trainer.step(query_tensors, response_tensors, rewards)
save:
ppo_trainer.save_pretrained('path')
I have the same problem. Any update here?
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
currently, I use
ppo_trainer.save_pretrained
to save a model that is still in training, because the machine I used is rather unstable, and I would often need to resume retraining should it be interrupted. When I resume the training I got the following warning:I guess this is relevant to my case, since I need to resume PPO training. What is the proper way then to save the checkpoint of PPO training with the goal of resuming it later?