hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
https://arxiv.org/abs/2403.13372
Apache License 2.0
34.07k stars 4.2k forks source link

训练完成后输入长文本的支持 #4366

Closed lxb0425 closed 4 months ago

lxb0425 commented 4 months ago

Reminder

System Info

对qwen2-72b-instruct 训练完成后 并且量化gptq-4位,使用以下命令部署没有问题,问答也ok CUDA_VISIBLE_DEVICES=0,1 API_PORT=7864 llamafactory-cli api \ --model_name_or_path /workspace/chat-1.1 \ --template qwen \ --infer_backend vllm \ --vllm_enforce_eager true 但是想支持长文本输入,根据qwen2官方加了配置如下 image

但还是报错 image

Reproduction

其他配置如下 image image 尝试把generation_config.json里面的max-new-tokens改成20480 ,但是请求里面的max-tokens就变成200了 image

Expected behavior

No response

Others

No response

hiyouga commented 4 months ago

vllm_maxlen: 8192