hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
https://arxiv.org/abs/2403.13372
Apache License 2.0
34.19k stars 4.21k forks source link

为什么训练好的checkpoint直接train的Eval来推理非常缓慢,大概比手动merge并自己写脚本来推理慢2.4倍以上(5000个样本要2h40m) #5949

Closed FloatFrank closed 6 days ago

FloatFrank commented 6 days ago

Reminder

System Info

Reproduction

为什么训练好的checkpoint直接train的Eval来推理非常缓慢,大概比手动merge并自己写脚本来推理慢2.4倍以上(5000个样本要2h40m)。手动merge又会多一些步骤,在需要批量推理时也比较占用磁盘空间,有没有什么别的方法?

Expected behavior

No response

Others

No response

FloatFrank commented 6 days ago

yaml:### model model_name_or_path: E:\Autodl-Run-Results\Qwen2.5-7b-16r-***\merge-checkpoint-218400 adapter_name_or_path: E:\LargeLanguageModels\LLaMA-Factory\saves\qwen2.5-7b-continue-but-source\checkpoint-3750

method

stage: sft do_predict: true finetuning_type: lora

dataset

eval_dataset: *** template: qwen cutoff_len: 512 overwrite_cache: true preprocessing_num_workers: 16

output

output_dir: E:\LargeLanguageModels\LLaMA-Factory\saves\qwen2.5-7b-continue-but-source\predict overwrite_output_dir: true

eval

per_device_eval_batch_size: 1 predict_with_generate: true ddp_timeout: 180000000 bf16: true