huggingface / trl

Train transformer language models with reinforcement learning.
http://hf.co/docs/trl
Apache License 2.0
10.04k stars 1.27k forks source link

how to save/load model? #1327

Closed ADoublLEN closed 6 months ago

ADoublLEN commented 9 months ago

I've tried save model via:

ppo_trainer.save_pretrained("./model_after_rl")

and load the model via:

model = AutoModelForCausalLMWithValueHead.from_pretrained("./model_after_rl") ref_model = AutoModelForCausalLMWithValueHead.from_pretrained("./model_after_rl")

But the performance is same to without any reinforcement learning, when I add the loaded model to a new PPO trainer, freeze the model and test again.

ADoublLEN commented 9 months ago

BTW I insert the loaded model(trained) into the PPO trainer, and freeze the parameters, and I define a dummy optimizer as:

dummy_param = torch.nn.Parameter(torch.empty(0)) optimizer = torch.optim.Adam([dummy_param], lr=1e-3)

It just performs as the model without any reinforcement learning

younesbelkada commented 9 months ago

Hi @ADoublLEN Hmm, we should have ci tests that tests this specific scenario, can you try with latest transformers / trl ?

pip install -U transformers trl
ADoublLEN commented 9 months ago

I just updated via pip install -U transformers trl, but the problem still exists.

First condition: Reload from pretrained ppo_trainer.save_pretrained("./model_after_rl_comb_reward") model = AutoModelForCausalLMWithValueHead.from_pretrained("./model_after_rl_comb_reward") ppo_trainer = PPOTrainer(config, model, ref_model, tokenizer, dataset=dataset, data_collator=collator)

then test: same performance as with no RL (bad)

Second condition: backup the model and redefine the ppo_trainer good_model = copy.deepcopy(model) model = copy.deepcopy(good_model) ppo_trainer = PPOTrainer(config, model, ref_model, tokenizer, dataset=dataset, data_collator=collator) then test: performance is same as after RL (Good)

Third condition: use the PPOtrainer in training directly for testing: performance is same as after RL (Good)

Do you have any idea?

ADoublLEN commented 9 months ago

Maybe because of the peft?

younesbelkada commented 9 months ago

I think that I might have a clue - here we force save the "pytorch_model.bin" for peft: https://github.com/huggingface/trl/blob/3b1911c2a99bc362992266a96743f67f3212218c/trl/models/modeling_base.py#L554, are you using PEFT? Can you also print what is inside the saved folder?

ADoublLEN commented 9 months ago

Thanks for the fast reply

Yes I used a Peft. Is the way I load/save the model incorrectly?

Input: Model after sft: Then I put the model to the ppotrainer config.json generation_config.json model.safetensors special_tokens_map.json tokenizer.json tokenizer_config.json training_args.bin vocab.txt

Saved output: below is the files in the model after rl(ppotrianer) adapter_config.json adapter_model.safetensors pytorch_model.bin README.md special_tokens_map.json tokenizer.json tokenizer_config.json vocab.txt

ADoublLEN commented 9 months ago

Here is how I define the input model (model after sft)

from peft import LoraConfig
lora_config = LoraConfig(
    r=16, 
    lora_alpha=32,  
    lora_dropout=0.1,  
    bias="none",
    task_type="CAUSAL_LM",
)

model = AutoModelForCausalLMWithValueHead.from_pretrained(config.model_name, peft_config=lora_config)
GSalimp commented 8 months ago

Did you solve it? I'm having the same issue, i've fine tuned a Llama 7b model using peft, and got satisfying results in inference, but when i try to use SFTTrainer.save_model, and load the model from the saved files using LlamaForCausalLM.from_pretrained, the inference result seem to just be of the not fine-tuned model

ADoublLEN commented 8 months ago

Did you solve it? I'm having the same issue, i've fine tuned a Llama 7b model using peft, and got satisfying results in inference, but when i try to use SFTTrainer.save_model, and load the model from the saved files using LlamaForCausalLM.from_pretrained, the inference result seem to just be of the not fine-tuned model

You will need to fuse the SFT checkpoint to your og model. A wrapped-upped pipeline has been proposed: https://github.com/hiyouga/LLaMA-Factory

chenweilong915 commented 8 months ago

I also met this problem. When I use the SFTTrainer with PEFT and save the model, it is the same for the original model and the trained model. When I print the model parameters, it is the same.

github-actions[bot] commented 7 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.