Closed Katehuuh closed 1 week ago
推理时是不是template参数设置错误
推理时是不是template参数设置错误
No, I did performed experiments with alpaca
and sharegpt
format but the template: llama3
used was like from:
https://github.com/hiyouga/LLaMA-Factory/blob/c8b4c7fee5398654683b713ad5c03b5daf13218a/examples/train_lora/llama3_lora_dpo.yaml#L14
and itself the default command/dpo_en_demo
work just fine:
https://github.com/hiyouga/LLaMA-Factory/blob/c8b4c7fee5398654683b713ad5c03b5daf13218a/examples/README.md?plain=1#L53
https://github.com/hiyouga/LLaMA-Factory/blob/c8b4c7fee5398654683b713ad5c03b5daf13218a/examples/train_lora/llama3_lora_dpo.yaml#L13
This could Indicate french_orca_rlhf
dataset issue however using sigmoid
work just fine.
"hf_hub_url": "jpacifico/french-orca-dpo-pairs-revised",
Edit:
Maybe cause by older config of NousResearch/Meta-Llama-3.1-8B-Instruct
try hyper params from the paper
Reminder
System Info
on c93d55bf
llamafactory
version: 0.8.4.dev0Reproduction
graph training_loss
`simpo`: ![simpo](https://github.com/user-attachments/assets/e819ee30-ea47-4ead-bc7a-c89608f773f2) `sigmoid`: ![sigmoid](https://github.com/user-attachments/assets/732c8608-b531-40ff-8bc6-53734bbf4e93)
llamafactory-cli train examples/train_lora/llama3_lora_dpo.yaml
:dataset_info.json
:Dataset provide to fully replicate the issue.
Expected behavior
When apply
pref_loss: simpo
in DPO training, than during inference using all diff saved lora result in hallucination output:HeaderCodeHeaderCodeHeader
... However usingsigmoid
work just fine.Other failed attempt:
max_steps: 1000
+learning_rate: 2.0e-4
:quantization_bit
,use_adam_mini: true
Others
No response