xusenlinzy / api-for-open-llm

Openai style api for open large language models, using LLMs just as chatgpt! Support for LLaMA, LLaMA-2, BLOOM, Falcon, Baichuan, Qwen, Xverse, SqlCoder, CodeLLaMA, ChatGLM, ChatGLM2, ChatGLM3 etc. 开源大模型的统一后端接口
Apache License 2.0
dcoker 部署 vllm 出现 404 Not Found #271

Closed skyliwq closed 3 weeks ago

skyliwq commented 1 month ago

docker 部署 vllm qwen模型 启动成功,调用出现 "POST /v1/chat/completions HTTP/1.1" 404 Not Found 什么原因无法解决,请大神帮帮忙. 显示 No operations defined in spec!

2024-05-09 07:00:03 INFO: Started server process [1] 2024-05-09 07:00:03 INFO: Waiting for application startup. 2024-05-09 07:00:03 INFO: Application startup complete. 2024-05-09 07:00:03 INFO: Uvicorn running on (Press CTRL+C to quit) 2024-05-09 07:00:04 INFO: - "POST /v1/chat/completions HTTP/1.1" 404 Not Found

Tendo33 commented 1 month ago


skyliwq commented 1 month ago


我是用psotman 请求的 显示 { "detail": "Not Found" } 用官方提供的 tests chat.py 也显示错误 部署用的官方docke-compose 文件 配置都没问题

Tendo33 commented 1 month ago


请求的地址有加 /v1 吗? 如果不行可以去部署地址的 /docs 看一下fastapi 接口,可以直接在线起到跟 psotman 一样的效果

skyliwq commented 1 month ago

Reference i 请求参数 部署参数都对的 核实了很多遍 显示 No operations defined in spec!

skyliwq commented 1 month ago

配置为 ENGINE=vllm 报错 ENGINE=default 正常

xusenlinzy commented 1 month ago


skyliwq commented 1 month ago

那应该是vllm安装没有成功 直接docker部署的 如何从新安装,大神指点 root@a73600e73869:/workspace# pip show vllm Name: vllm Version: 0.4.0 Summary: A high-throughput and memory-efficient inference and serving engine for LLMs Home-page: https://github.com/vllm-project/vllm Author: vLLM Team Author-email: License: Apache 2.0 Location: /usr/local/lib/python3.10/dist-packages Requires: cmake, fastapi, ninja, numpy, outlines, prometheus-client, psutil, py-cpuinfo, pydantic, pynvml, ray, requests, sentencepiece, tiktoken, torch, transformers, triton, uvicorn, xformers

Tendo33 commented 1 month ago

你 docker build 镜像的时候用的哪个docker File ?换成 vllm 那个

skyliwq commented 1 month ago



JadynWong commented 1 month ago

同样的问题, 最新的代码, 使用docker-compose vllm部署, GPU只有embedding模型的占用, 日志也不报错. 请求404


2024-05-21 10:20:09.754 | DEBUG    | api.config:<module>:338 - SETTINGS: {
    "embedding_name": "/models/BAAI/bge-m3",
    "rerank_name": null,
    "embedding_size": -1,
    "embedding_device": "cuda:0",
    "rerank_device": "cuda:0",
    "trust_remote_code": true,
    "tokenize_mode": "slow",
    "tensor_parallel_size": 1,
    "gpu_memory_utilization": 0.9,
    "max_num_batched_tokens": -1,
    "max_num_seqs": 256,
    "quantization_method": null,
    "enforce_eager": false,
    "max_context_len_to_capture": 8192,
    "max_loras": 1,
    "max_lora_rank": 32,
    "lora_extra_vocab_size": 256,
    "lora_dtype": "auto",
    "max_cpu_loras": -1,
    "lora_modules": "",
    "vllm_disable_log_stats": true,
    "model_name": "qwen2",
    "model_path": "/models/Qwen/Qwen1.5-14B-Chat",
    "dtype": "bfloat16",
    "load_in_8bit": false,
    "load_in_4bit": false,
    "context_length": -1,
    "chat_template": "qwen2",
    "rope_scaling": null,
    "flash_attn": false,
    "use_streamer_v2": true,
    "interrupt_requests": true,
    "host": "",
    "port": 8000,
    "api_prefix": "/v1",
    "engine": "vllm",
    "tasks": [
    "device_map": "auto",
    "gpus": null,
    "num_gpus": 1,
    "activate_inference": true,
    "model_names": [
    "api_keys": [
INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on (Press CTRL+C to quit)
INFO: - "GET /v1/models HTTP/1.1" 404 Not Found
INFO: - "POST /v1/chat/completions HTTP/1.1" 404 Not Found
JadynWong commented 1 month ago


https://github.com/xusenlinzy/api-for-open-llm/blob/e46e48056a02ffbd90e0dfe4bc2f803df1e7e4e1/api/models.py#L100 此处加了一行打印异常日志

今天下午才拉取的代码, 重新构建的镜像, 期间没有任何报错

docker build -f docker/Dockerfile.vllm -t llm-api:vllm .

可能相关的问题 https://github.com/vllm-project/vllm/issues/3528

liho00 commented 1 month ago
