Openai style api for open large language models, using LLMs just as chatgpt! Support for LLaMA, LLaMA-2, BLOOM, Falcon, Baichuan, Qwen, Xverse, SqlCoder, CodeLLaMA, ChatGLM, ChatGLM2, ChatGLM3 etc. 开源大模型的统一后端接口
提交前必须检查以下项目 | The following items must be checked before submission
[X] 请确保使用的是仓库最新代码(git pull),一些问题已被解决和修复。 | Make sure you are using the latest code from the repository (git pull), some issues have already been addressed and fixed.
[X] 我已阅读项目文档和FAQ章节并且已在Issue中对问题进行了搜索,没有找到相似问题和解决方案 | I have searched the existing issues / discussions
问题类型 | Type of problem
None
操作系统 | Operating system
None
详细描述问题 | Detailed description of the problem
The model's max seq len (8192) is larger than the maximum number of tokens that can be stored in KV cache (4512). Try increasing gpu_memory_utilization or decreasing max_model_len when initializing the engine.
提交前必须检查以下项目 | The following items must be checked before submission
问题类型 | Type of problem
None
操作系统 | Operating system
None
详细描述问题 | Detailed description of the problem
The model's max seq len (8192) is larger than the maximum number of tokens that can be stored in KV cache (4512). Try increasing
gpu_memory_utilization
or decreasingmax_model_len
when initializing the engine.Dependencies
vllm 0.2.7 torch 2.1.0
运行日志或截图 | Runtime logs or screenshots