Openai style api for open large language models, using LLMs just as chatgpt! Support for LLaMA, LLaMA-2, BLOOM, Falcon, Baichuan, Qwen, Xverse, SqlCoder, CodeLLaMA, ChatGLM, ChatGLM2, ChatGLM3 etc. 开源大模型的统一后端接口
提交前必须检查以下项目 | The following items must be checked before submission
[X] 请确保使用的是仓库最新代码(git pull),一些问题已被解决和修复。 | Make sure you are using the latest code from the repository (git pull), some issues have already been addressed and fixed.
[X] 我已阅读项目文档和FAQ章节并且已在Issue中对问题进行了搜索,没有找到相似问题和解决方案 | I have searched the existing issues / discussions
qwen | INFO 08-12 13:18:41 llm_engine.py:70] Initializing an LLM engine with config: model='/model/qwen', tokenizer='/model/qwen', tokenizer_mode=auto, trust_remote_code=True, dtype=torch.float16, use_dummy_weights=False, download_dir=None, use_np_weights=False, tensor_parallel_size=1, seed=0)
qwen | WARNING 08-12 13:18:42 tokenizer.py:63] Using a slow tokenizer. This might cause a significant slowdown. Consider using a fast tokenizer instead.
qwen | INFO 08-12 13:18:55 llm_engine.py:196] # GPU blocks: 562, # CPU blocks: 512
qwen | WARNING 08-12 13:18:58 tokenizer.py:63] Using a slow tokenizer. This might cause a significant slowdown. Consider using a fast tokenizer instead.
qwen | Traceback (most recent call last):
qwen | File "api/vllm_server.py", line 666, in
qwen | embed_client = SentenceTransformer(args.embedding_name, device=args.device)
qwen | AttributeError: 'Namespace' object has no attribute 'device'
提交前必须检查以下项目 | The following items must be checked before submission
问题类型 | Type of problem
模型推理和部署 | Model inference and deployment
操作系统 | Operating system
Linux
详细描述问题 | Detailed description of the problem
如果不添加embedding_name,服务可以正常启动。添加了以后报错,我尝试修改源码将'device'直接替换成'cuda',但会有其他错误。
启动的docker-compose.yml如下:
Dependencies
No response
运行日志或截图 | Runtime logs or screenshots
qwen | INFO 08-12 13:18:41 llm_engine.py:70] Initializing an LLM engine with config: model='/model/qwen', tokenizer='/model/qwen', tokenizer_mode=auto, trust_remote_code=True, dtype=torch.float16, use_dummy_weights=False, download_dir=None, use_np_weights=False, tensor_parallel_size=1, seed=0) qwen | WARNING 08-12 13:18:42 tokenizer.py:63] Using a slow tokenizer. This might cause a significant slowdown. Consider using a fast tokenizer instead. qwen | INFO 08-12 13:18:55 llm_engine.py:196] # GPU blocks: 562, # CPU blocks: 512 qwen | WARNING 08-12 13:18:58 tokenizer.py:63] Using a slow tokenizer. This might cause a significant slowdown. Consider using a fast tokenizer instead. qwen | Traceback (most recent call last): qwen | File "api/vllm_server.py", line 666, in
qwen | embed_client = SentenceTransformer(args.embedding_name, device=args.device)
qwen | AttributeError: 'Namespace' object has no attribute 'device'