hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
https://arxiv.org/abs/2403.13372
Apache License 2.0
32.64k stars 4k forks source link

Segmentation fault when Lora DPO with Phi-3-mini-128k-instruct #3801

Closed zjc17 closed 5 months ago

zjc17 commented 5 months ago

Reminder

Reproduction

CUDA_VISIBLE_DEVICES=4,5,6,7 llamafactory-cli train phi-3-mini-128k-dpo-0518.yaml

### model
model_name_or_path: microsoft/Phi-3-mini-128k-instruct

### method
stage: dpo
do_train: true
finetuning_type: lora
lora_target: all #qkv_proj
lora_rank: 16
lora_alpha: 16
dpo_ftx: 1.0

### dataset
dataset: orca_pairs
template: phi
cutoff_len: 4096
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16

### output
output_dir: ~/workspace/saves/Phi-3-mini-128k-instruct/lora/sft/dpo-0518
logging_steps: 1
save_steps: 500
plot_loss: true
overwrite_output_dir: true

### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 0.000005
num_train_epochs: 4.0
lr_scheduler_type: cosine
warmup_steps: 0.1
fp16: true

### eval
val_size: 0.1
per_device_eval_batch_size: 1
evaluation_strategy: steps
eval_steps: 500

### report
report_to: wandb
run_name: phi-3-mini-128k-dpo-0518

Expected behavior

No response

System Info

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

Others

  0%|                                                                                                         | 0/112 [00:00<?, ?it/s]
You are not running the flash-attention implementation, expect numerical differences.
~/miniconda3/envs/llama-factory/lib/python3.10/site-packages/torch/utils/checkpoint.py:91: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
  warnings.warn(
Segmentation fault (core dumped)
hiyouga commented 5 months ago

https://github.com/hiyouga/LLaMA-Factory/tree/main/examples#lora-fine-tuning-on-multiple-gpus