Closed skyliwq closed 3 weeks ago
发一下,请求的脚本
发一下,请求的脚本
我是用psotman 请求的 显示 { "detail": "Not Found" } 用官方提供的 tests chat.py 也显示错误 部署用的官方docke-compose 文件 配置都没问题
发一下,请求的脚本
我是用psotman 请求的 显示 { "detail": "Not Found" } 用官方提供的 tests chat.py 也显示错误 部署用的官方docke-compose 文件 配置都没问题
请求的地址有加 /v1 吗? 如果不行可以去部署地址的 /docs 看一下fastapi 接口,可以直接在线起到跟 psotman 一样的效果
Reference i 请求参数 部署参数都对的 核实了很多遍 http://127.0.0.1:7891/docs 显示 No operations defined in spec!
配置为 ENGINE=vllm 报错 ENGINE=default 正常
那应该是vllm安装没有成功
那应该是vllm安装没有成功 直接docker部署的 如何从新安装,大神指点 root@a73600e73869:/workspace# pip show vllm Name: vllm Version: 0.4.0 Summary: A high-throughput and memory-efficient inference and serving engine for LLMs Home-page: https://github.com/vllm-project/vllm Author: vLLM Team Author-email: License: Apache 2.0 Location: /usr/local/lib/python3.10/dist-packages Requires: cmake, fastapi, ninja, numpy, outlines, prometheus-client, psutil, py-cpuinfo, pydantic, pynvml, ray, requests, sentencepiece, tiktoken, torch, transformers, triton, uvicorn, xformers
你 docker build 镜像的时候用的哪个docker File ?换成 vllm 那个
vllm
换的是这个
同样的问题, 最新的代码, 使用docker-compose vllm部署, GPU只有embedding模型的占用, 日志也不报错. 请求404
LOG
=============
== PyTorch ==
=============
NVIDIA Release 23.10 (build 71422337)
PyTorch Version 2.1.0a0+32f93b1
Container image Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copyright (c) 2014-2023 Facebook Inc.
Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
Copyright (c) 2012-2014 Deepmind Technologies (Koray Kavukcuoglu)
Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
Copyright (c) 2011-2013 NYU (Clement Farabet)
Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
Copyright (c) 2006 Idiap Research Institute (Samy Bengio)
Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
Copyright (c) 2015 Google Inc.
Copyright (c) 2015 Yangqing Jia
Copyright (c) 2013-2016 The Caffe contributors
All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
NOTE: The SHMEM allocation limit is set to the default of 64MB. This may be
insufficient for PyTorch. NVIDIA recommends the use of the following flags:
docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 ...
2024-05-21 10:20:09.754 | DEBUG | api.config:<module>:338 - SETTINGS: {
"embedding_name": "/models/BAAI/bge-m3",
"rerank_name": null,
"embedding_size": -1,
"embedding_device": "cuda:0",
"rerank_device": "cuda:0",
"trust_remote_code": true,
"tokenize_mode": "slow",
"tensor_parallel_size": 1,
"gpu_memory_utilization": 0.9,
"max_num_batched_tokens": -1,
"max_num_seqs": 256,
"quantization_method": null,
"enforce_eager": false,
"max_context_len_to_capture": 8192,
"max_loras": 1,
"max_lora_rank": 32,
"lora_extra_vocab_size": 256,
"lora_dtype": "auto",
"max_cpu_loras": -1,
"lora_modules": "",
"vllm_disable_log_stats": true,
"model_name": "qwen2",
"model_path": "/models/Qwen/Qwen1.5-14B-Chat",
"dtype": "bfloat16",
"load_in_8bit": false,
"load_in_4bit": false,
"context_length": -1,
"chat_template": "qwen2",
"rope_scaling": null,
"flash_attn": false,
"use_streamer_v2": true,
"interrupt_requests": true,
"host": "0.0.0.0",
"port": 8000,
"api_prefix": "/v1",
"engine": "vllm",
"tasks": [
"llm",
"rag"
],
"device_map": "auto",
"gpus": null,
"num_gpus": 1,
"activate_inference": true,
"model_names": [
"qwen2",
"bge-m3"
],
"api_keys": [
"xxxxxxxxxxxxxxx"
]
}
INFO: Started server process [1]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO: 172.18.0.1:51554 - "GET /v1/models HTTP/1.1" 404 Not Found
INFO: 172.18.0.1:45768 - "POST /v1/chat/completions HTTP/1.1" 404 Not Found
https://github.com/xusenlinzy/api-for-open-llm/blob/e46e48056a02ffbd90e0dfe4bc2f803df1e7e4e1/api/models.py#L100 此处加了一行打印异常日志
今天下午才拉取的代码, 重新构建的镜像, 期间没有任何报错
docker build -f docker/Dockerfile.vllm -t llm-api:vllm .
遇到一样的问题,看来是vllm的问题?
docker 部署 vllm qwen模型 启动成功,调用出现 "POST /v1/chat/completions HTTP/1.1" 404 Not Found 什么原因无法解决,请大神帮帮忙.
http://127.0.0.1:7891/docs 显示 No operations defined in spec!
2024-05-09 07:00:03 INFO: Started server process [1] 2024-05-09 07:00:03 INFO: Waiting for application startup. 2024-05-09 07:00:03 INFO: Application startup complete. 2024-05-09 07:00:03 INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit) 2024-05-09 07:00:04 INFO: 172.16.1.1:57384 - "POST /v1/chat/completions HTTP/1.1" 404 Not Found