Closed ADoublLEN closed 6 months ago
BTW I insert the loaded model(trained) into the PPO trainer, and freeze the parameters, and I define a dummy optimizer as:
dummy_param = torch.nn.Parameter(torch.empty(0)) optimizer = torch.optim.Adam([dummy_param], lr=1e-3)
It just performs as the model without any reinforcement learning
Hi @ADoublLEN
Hmm, we should have ci tests that tests this specific scenario, can you try with latest transformers
/ trl
?
pip install -U transformers trl
I just updated via pip install -U transformers trl, but the problem still exists.
First condition: Reload from pretrained
ppo_trainer.save_pretrained("./model_after_rl_comb_reward")
model = AutoModelForCausalLMWithValueHead.from_pretrained("./model_after_rl_comb_reward")
ppo_trainer = PPOTrainer(config, model, ref_model, tokenizer, dataset=dataset, data_collator=collator)
then test: same performance as with no RL (bad)
Second condition: backup the model and redefine the ppo_trainer
good_model = copy.deepcopy(model)
model = copy.deepcopy(good_model)
ppo_trainer = PPOTrainer(config, model, ref_model, tokenizer, dataset=dataset, data_collator=collator)
then test: performance is same as after RL (Good)
Third condition: use the PPOtrainer in training directly for testing: performance is same as after RL (Good)
Do you have any idea?
Maybe because of the peft?
I think that I might have a clue - here we force save the "pytorch_model.bin" for peft: https://github.com/huggingface/trl/blob/3b1911c2a99bc362992266a96743f67f3212218c/trl/models/modeling_base.py#L554, are you using PEFT? Can you also print what is inside the saved folder?
Thanks for the fast reply
Yes I used a Peft. Is the way I load/save the model incorrectly?
Input: Model after sft: Then I put the model to the ppotrainer config.json generation_config.json model.safetensors special_tokens_map.json tokenizer.json tokenizer_config.json training_args.bin vocab.txt
Saved output: below is the files in the model after rl(ppotrianer) adapter_config.json adapter_model.safetensors pytorch_model.bin README.md special_tokens_map.json tokenizer.json tokenizer_config.json vocab.txt
Here is how I define the input model (model after sft)
from peft import LoraConfig
lora_config = LoraConfig(
r=16,
lora_alpha=32,
lora_dropout=0.1,
bias="none",
task_type="CAUSAL_LM",
)
model = AutoModelForCausalLMWithValueHead.from_pretrained(config.model_name, peft_config=lora_config)
Did you solve it? I'm having the same issue, i've fine tuned a Llama 7b model using peft, and got satisfying results in inference, but when i try to use SFTTrainer.save_model, and load the model from the saved files using LlamaForCausalLM.from_pretrained, the inference result seem to just be of the not fine-tuned model
Did you solve it? I'm having the same issue, i've fine tuned a Llama 7b model using peft, and got satisfying results in inference, but when i try to use SFTTrainer.save_model, and load the model from the saved files using LlamaForCausalLM.from_pretrained, the inference result seem to just be of the not fine-tuned model
You will need to fuse the SFT checkpoint to your og model. A wrapped-upped pipeline has been proposed: https://github.com/hiyouga/LLaMA-Factory
I also met this problem. When I use the SFTTrainer with PEFT and save the model, it is the same for the original model and the trained model. When I print the model parameters, it is the same.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
I've tried save model via:
ppo_trainer.save_pretrained("./model_after_rl")
and load the model via:
model = AutoModelForCausalLMWithValueHead.from_pretrained("./model_after_rl") ref_model = AutoModelForCausalLMWithValueHead.from_pretrained("./model_after_rl")
But the performance is same to without any reinforcement learning, when I add the loaded model to a new PPO trainer, freeze the model and test again.