simpo unexpected gibberish output on inference

Katehuuh commented 1 month ago

Reminder

[X] I have read the README and searched the existing issues.

System Info

on c93d55bf

llamafactory version: 0.8.4.dev0
Platform: Windows-10-10.0.22621-SP0
Python version: 3.10.8
PyTorch version: 2.3.0+cu121 (GPU)
Transformers version: 4.43.3
Datasets version: 2.20.0
Accelerate version: 0.31.0
PEFT version: 0.11.1
TRL version: 0.8.6

Reproduction

graph training_loss

`simpo`: ![simpo](https://github.com/user-attachments/assets/e819ee30-ea47-4ead-bc7a-c89608f773f2) `sigmoid`: ![sigmoid](https://github.com/user-attachments/assets/732c8608-b531-40ff-8bc6-53734bbf4e93)

llamafactory-cli train examples/train_lora/llama3_lora_dpo.yaml:

### model:    
model_name_or_path: NousResearch/Meta-Llama-3.1-8B-Instruct
quantization_bit: 4

### method
stage: dpo
do_train: true
finetuning_type: lora
lora_target: all
pref_beta: 0.1
pref_loss: simpo  # choices: [sigmoid (dpo), orpo, simpo]
use_adam_mini: true

### dataset
dataset: french_orca_rlhf-revised
template: llama3
cutoff_len: 4096
max_samples: 10000
overwrite_cache: true
preprocessing_num_workers: 16

### output
output_dir: saves/LLaMA3.1-8B-Chat/lora/QLoRA_french_dpo
logging_steps: 10
save_steps: 1000
plot_loss: true
overwrite_output_dir: true

### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 5.0e-6
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
flash_attn: fa2
optim: paged_adamw_8bit

dataset_info.json:

{
  "french_orca_rlhf-revised": {
    "hf_hub_url": "jpacifico/french-orca-dpo-pairs-revised",
    "ranking": true,
    "columns": {
      "prompt": "question",
      "chosen": "chosen",
      "rejected": "rejected",
      "system": "system"
    }
  },
...

Dataset provide to fully replicate the issue.

Expected behavior

When apply pref_loss: simpo in DPO training, than during inference using all diff saved lora result in hallucination output: HeaderCodeHeaderCodeHeader... However using sigmoid work just fine.

Other failed attempt:

max_steps: 1000+learning_rate: 2.0e-4:
No quantization_bit, use_adam_mini: true
Clean format dataset convert to DPO sharegpt.

Others

No response

AlexYoung757 commented 3 weeks ago

推理时是不是template参数设置错误

Katehuuh commented 3 weeks ago

推理时是不是template参数设置错误

No, I did performed experiments with alpaca and sharegpt format but the template: llama3 used was like from: https://github.com/hiyouga/LLaMA-Factory/blob/c8b4c7fee5398654683b713ad5c03b5daf13218a/examples/train_lora/llama3_lora_dpo.yaml#L14 and itself the default command/dpo_en_demo work just fine: https://github.com/hiyouga/LLaMA-Factory/blob/c8b4c7fee5398654683b713ad5c03b5daf13218a/examples/README.md?plain=1#L53 https://github.com/hiyouga/LLaMA-Factory/blob/c8b4c7fee5398654683b713ad5c03b5daf13218a/examples/train_lora/llama3_lora_dpo.yaml#L13

This could Indicate french_orca_rlhf dataset issue however using sigmoid work just fine.

"hf_hub_url": "jpacifico/french-orca-dpo-pairs-revised",

Edit: Maybe cause by older config of NousResearch/Meta-Llama-3.1-8B-Instruct

and when loading llama 3.1 in Factory it does apply rope factor of 2.0 instead of 8.0 recommended by the config.

```cmd "rope_scaling": { "factor": 8.0, "high_freq_factor": 4.0, "low_freq_factor": 1.0, "original_max_position_embeddings": 8192, "rope_type": "llama3" }, "rope_theta": 500000.0, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.43.3", "use_cache": true, "vocab_size": 128256 } INFO - llamafactory.model.model_utils.rope - Using linear scaling strategy and setting scaling factor to 2.0 ```

hiyouga commented 1 week ago

try hyper params from the paper

hiyouga / LLaMA-Factory