hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
https://arxiv.org/abs/2403.13372
Apache License 2.0
35.02k stars 4.33k forks source link

fsdp微调llama3-70b后推理结果完全错误 #3620

Closed ben-8878 closed 6 months ago

ben-8878 commented 6 months ago

训练代码:

CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch \
    --config_file examples/accelerate/fsdp_config.yaml \
    src/train_bash.py \
    --stage sft  \
    --do_train \
    --model_name_or_path ../Chinese-LLM-Chat/models/Meta-Llama-3-70B-Instruct \
    --dataset sjcy_sft_zh,general_intension_sft_zh,in3_interaction_zh,cot_zh,sharegpt4_local,comparison_gpt4_zh  \
    --template llama3 \
    --finetuning_type lora \
    --lora_target q_proj,v_proj \
    --output_dir finetuned_models/intention-llama3-70b  \
    --cutoff_len 32768 \
    --preprocessing_num_workers 16 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 2 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 5000 \
    --eval_steps 5000 \
    --learning_rate 5e-5 \
    --num_train_epochs 6.0 \
    --plot_loss \
    --ddp_timeout 1800000 \
    --val_size 0.001 \
    --quantization_bit 4 \
    --shift_attn \
    --rope_scaling linear \
    --fp16

推理代码:

CUDA_VISIBLE_DEVICES=0,1,2,3 python src/cli_demo.py \
    --model_name_or_path ../../model_hub/Meta-Llama-3-70B-Instruct-hf \
    --template llama3 \
    --finetuning_type lora \
    --adapter_name_or_path finetune_models/intention-llama3-70b-4k/checkpoint-15000

推理结果示例1 image 推理结果示例2 image

hiyouga commented 6 months ago

关闭 shift_attn

ben-8878 commented 6 months ago

关闭 shift_attn

关闭 shift_attn重新训练吗?

hiyouga commented 6 months ago

是的

ben-8878 commented 6 months ago

shift_attn 用来扩展上下文的,这个参数目前无法使用? 还是fsdp和shift_attn不兼容?