xusenlinzy / api-for-open-llm

Openai style api for open large language models, using LLMs just as chatgpt! Support for LLaMA, LLaMA-2, BLOOM, Falcon, Baichuan, Qwen, Xverse, SqlCoder, CodeLLaMA, ChatGLM, ChatGLM2, ChatGLM3 etc. 开源大模型的统一后端接口
Apache License 2.0
2.36k stars 270 forks source link

使用vllm启动server.py时报错 #225

Closed whm233 closed 10 months ago

whm233 commented 10 months ago

提交前必须检查以下项目 | The following items must be checked before submission

问题类型 | Type of problem

None

操作系统 | Operating system

None

详细描述问题 | Detailed description of the problem

The model's max seq len (8192) is larger than the maximum number of tokens that can be stored in KV cache (4512). Try increasing gpu_memory_utilization or decreasing max_model_len when initializing the engine.

Dependencies

vllm 0.2.7 torch 2.1.0

运行日志或截图 | Runtime logs or screenshots

# 请在此处粘贴运行日志
# Please paste the run log here
xusenlinzy commented 10 months ago

试试设置上下文长度,环境变量CONTEXT_LEN=4096或者更小的值