Qwen微调后推理中断输出不完整

chb630 commented 3 months ago

Reminder

[X] I have read the README and searched the existing issues.

System Info

transformers>=4.37.2 datasets>=2.14.3 accelerate>=0.27.2 peft>=0.10.0 trl>=0.8.1 gradio>=4.0.0 scipy einops sentencepiece protobuf uvicorn pydantic fastapi sse-starlette matplotlib>=3.7.0 fire packaging pyyaml

Reproduction

推理的长度影响吗？那是不是在推理的时候需要截断history的轮数，来达到不要超过最大输出token数也就是推理的时候加大max_new_tokens参数，且在聊天的时候截断history

效果如图：

训练脚本

export NCCL_IB_DISABLE="1"
deepspeed --num_gpus $gpus --master_port=9901 src/train.py \
    --deepspeed zero2.json \
    --stage sft \
    --do_train \
    --model_name_or_path qwen/Qwen15-14B-Chat \
    --dataset data \
    --template qwen \
    --finetuning_type lora \
    --lora_target q_proj,v_proj \
    --output_dir checkpoints/lora \
    --overwrite_cache \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 16 \
    --lr_scheduler_type linear \
    --logging_steps 10 \
    --save_steps 1500 \
    --learning_rate 1e-4 \
    --weight_decay 0.1 \
    --adam_beta1 0.95 \
    --adam_beta2 0.98 \
    --num_train_epochs 4 \
    --lora_rank 32 \
    --lora_alpha 64 \
    --lora_dropout 0.1 \
    --temperature 0.9 \
    --plot_loss \
    --max_new_tokens 8192 \
    --cutoff_len 8192 \
    --bf16 true \
    --quantization_bit 4

推理脚本

CUDA_VISIBLE_DEVICES=0 API_PORT=6006 llamafactory-cli chat \
    --model_name_or_path qwen/Qwen15-14B-Chat \
    --adapter_name_or_path checkpoints/ora/checkpoint-4500 \
    --finetuning_type lora \
    --template qwen \
    --max_new_tokens 4090 \
    --temperature 0.9 \
    --quantization_bit 4 \
    --repetition_penalty 1.1

hiyouga commented 3 months ago

训练时候是否包含了多轮对话数据？

chb630 commented 3 months ago

是的，我好像解决了，把历史加轮数截断，这个问题就消失了，是不是因为最大生成长度max_new_tokens超过了设置的阈值导致的呢

hiyouga / LLaMA-Factory

Qwen微调后推理中断输出不完整 #4054

Reminder

System Info

Reproduction