Openai style api for open large language models, using LLMs just as chatgpt! Support for LLaMA, LLaMA-2, BLOOM, Falcon, Baichuan, Qwen, Xverse, SqlCoder, CodeLLaMA, ChatGLM, ChatGLM2, ChatGLM3 etc. 开源大模型的统一后端接口
提交前必须检查以下项目 | The following items must be checked before submission
[X] 请确保使用的是仓库最新代码(git pull),一些问题已被解决和修复。 | Make sure you are using the latest code from the repository (git pull), some issues have already been addressed and fixed.
[X] 我已阅读项目文档和FAQ章节并且已在Issue中对问题进行了搜索,没有找到相似问题和解决方案 | I have searched the existing issues / discussions
问题类型 | Type of problem
模型推理和部署 | Model inference and deployment
操作系统 | Operating system
Linux
详细描述问题 | Detailed description of the problem
# 请在此处粘贴运行代码(如没有可删除该代码块)
# Paste the runtime code here (delete the code block if you don't have it)
Dependencies
# 请在此处粘贴依赖情况
# Please paste the dependencies here
运行日志或截图 | Runtime logs or screenshots
2024-02-26 18:17:01.130 | DEBUG | api.vllm_routes.chat:create_chat_completion:64 - ==== request ====
{'model': 'qwen2', 'frequency_penalty': 0.0, 'function_call': None, 'functions': None, 'logit_bias': None, 'logprobs': False, 'max_tokens': 512, 'n': 1, 'presence_penalty': 0.0, 'response_format': None, 'seed': None, 'stop': ['<|im_end|>', '<|endoftext|>'], 'temperature': 0.0, 'tool_choice': None, 'tools': None, 'top_logprobs': None, 'top_p': 1.0, 'user': None, 'stream': False, 'repetition_penalty': 1.03, 'typical_p': None, 'watermark': False, 'best_of': 1, 'ignore_eos': False, 'use_beam_search': False, 'stop_token_ids': [151643, 151644, 151645], 'skip_special_tokens': True, 'spaces_between_special_tokens': True, 'min_p': 0.0, 'prompt_or_messages': [{'content': '你好', 'role': 'user'}], 'echo': False}
INFO: 127.0.0.1:58946 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/usr/local/lib/miniconda3/envs/qwen/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 419, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/usr/local/lib/miniconda3/envs/qwen/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
return await self.app(scope, receive, send)
File "/usr/local/lib/miniconda3/envs/qwen/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/usr/local/lib/miniconda3/envs/qwen/lib/python3.10/site-packages/starlette/applications.py", line 123, in __call__
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/miniconda3/envs/qwen/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in __call__
raise exc
File "/usr/local/lib/miniconda3/envs/qwen/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in __call__
await self.app(scope, receive, _send)
File "/usr/local/lib/miniconda3/envs/qwen/lib/python3.10/site-packages/starlette/middleware/cors.py", line 83, in __call__
await self.app(scope, receive, send)
File "/usr/local/lib/miniconda3/envs/qwen/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/usr/local/lib/miniconda3/envs/qwen/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/usr/local/lib/miniconda3/envs/qwen/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/usr/local/lib/miniconda3/envs/qwen/lib/python3.10/site-packages/starlette/routing.py", line 758, in __call__
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/miniconda3/envs/qwen/lib/python3.10/site-packages/starlette/routing.py", line 778, in app
await route.handle(scope, receive, send)
File "/usr/local/lib/miniconda3/envs/qwen/lib/python3.10/site-packages/starlette/routing.py", line 299, in handle
await self.app(scope, receive, send)
File "/usr/local/lib/miniconda3/envs/qwen/lib/python3.10/site-packages/starlette/routing.py", line 79, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/usr/local/lib/miniconda3/envs/qwen/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/usr/local/lib/miniconda3/envs/qwen/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/usr/local/lib/miniconda3/envs/qwen/lib/python3.10/site-packages/starlette/routing.py", line 74, in app
response = await func(request)
File "/usr/local/lib/miniconda3/envs/qwen/lib/python3.10/site-packages/fastapi/routing.py", line 278, in app
raw_response = await run_endpoint_function(
File "/usr/local/lib/miniconda3/envs/qwen/lib/python3.10/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
return await dependant.call(**values)
File "api/vllm_routes/chat.py", line 67, in create_chat_completion
generator = engine.generate(params, request_id)
File "api/core/vllm_engine.py", line 121, in generate
prompt_or_messages = self.apply_chat_template(
File "api/core/vllm_engine.py", line 83, in apply_chat_template
return build_qwen_chat_input(
File "api/generation/qwen.py", line 71, in build_qwen_chat_input
im_start_tokens, im_end_tokens = [tokenizer.im_start_id], [tokenizer.im_end_id]
AttributeError: 'Qwen2TokenizerFast' object has no attribute 'im_start_id'
提交前必须检查以下项目 | The following items must be checked before submission
问题类型 | Type of problem
模型推理和部署 | Model inference and deployment
操作系统 | Operating system
Linux
详细描述问题 | Detailed description of the problem
Dependencies
运行日志或截图 | Runtime logs or screenshots