huggingface / trl

Train transformer language models with reinforcement learning.
http://hf.co/docs/trl
Apache License 2.0
8.73k stars 1.07k forks source link

how to save v_head #1650

Open zyzhang1130 opened 1 month ago

zyzhang1130 commented 1 month ago

currently, I use ppo_trainer.save_pretrained to save a model that is still in training, because the machine I used is rather unstable, and I would often need to resume retraining should it be interrupted. When I resume the training I got the following warning:

WARNING:root:A <class 'peft.peft_model.PeftModelForCausalLM'> model is loaded from 'RLGAF_gemma-7b-lima_sft_preprocessing_20epochs', and no v_head weight is found. This IS expected if you are not resuming PPO training.

I guess this is relevant to my case, since I need to resume PPO training. What is the proper way then to save the checkpoint of PPO training with the goal of resuming it later?

younesbelkada commented 1 month ago

Hi @zyzhang1130 We do have tests that runs saving tests on AutoModelWithValueHeadxxx: https://github.com/huggingface/trl/blob/13454d2f4b243b7260fa4ec828297812c3f975fc/tests/test_modeling_value_head.py#L102 but here it seems you are using PeftModel interface. Can you elaborate a bit on how you're training the model using PPO and saving it?

zyzhang1130 commented 1 month ago

Hi @zyzhang1130 We do have tests that runs saving tests on AutoModelWithValueHeadxxx:

https://github.com/huggingface/trl/blob/13454d2f4b243b7260fa4ec828297812c3f975fc/tests/test_modeling_value_head.py#L102 but here it seems you are using PeftModel interface. Can you elaborate a bit on how you're training the model using PPO and saving it?

loading:

model = AutoModelForCausalLMWithValueHead.from_pretrained(
        load_model_path, local_files_only=True,
        peft_config=lora_config,
        device_map="auto",
        )
ref_model = AutoModelForCausalLMWithValueHead.from_pretrained(
        load_model_path, local_files_only=True,
        load_in_8bit=True,
    )

ppo:

ppo_config = {
        'batch_size': negative_sample_size,  # Keep as is
        'learning_rate': 5e-7,  # Example value
        'mini_batch_size': 4,  # Adjusted
        'gradient_accumulation_steps': 1,  # Adjusted
        'optimize_cuda_cache': True
        # Add any other configurations as needed
    }

config = PPOConfig(**ppo_config)
ppo_trainer = PPOTrainer(config=config, tokenizer=tokenizer, model=model, ref_model = ref_model )
stats = ppo_trainer.step(query_tensors, response_tensors, rewards)

save:

ppo_trainer.save_pretrained('path')
paraGONG commented 3 weeks ago

I have the same problem. Any update here?

github-actions[bot] commented 2 days ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.