请输入您使用的大模型B数(示例:1.8B/3B/7B): 3B
model_size=3B
GPUID1=0, GPUID2=0, device_id=0
llm_api is set to [local]
device_id is set to [0]
runtime_backend is set to [hf]
model_name is set to [MiniChat-2-3B]
conv_template is set to [minichat]
tensor_parallel is set to [1]
gpu_memory_utilization is set to [0.81]
Do you want to use the previous ip: localhost? (yes/no) 是否使用上次的ip: ?(yes/no) 回车默认选yes,请输入:
Running under native Linux
[+] Running 5/6
⠼ Network qanything_milvus_mysql_local Created 1.5s
✔ Container milvus-minio-local Started 0.6s
✔ Container mysql-container-local Started 0.8s
✔ Container milvus-etcd-local Started 0.8s
✔ Container milvus-standalone-local Started 0.9s
✔ Container qanything-container-local Started 1.2s
qanything-container-local |
qanything-container-local | =============================
qanything-container-local | == Triton Inference Server ==
qanything-container-local | =============================
qanything-container-local |
qanything-container-local | NVIDIA Release 23.05 (build 61161506)
qanything-container-local | Triton Server Version 2.34.0
qanything-container-local |
qanything-container-local | Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
qanything-container-local |
qanything-container-local | Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
qanything-container-local |
qanything-container-local | This container image and its contents are governed by the NVIDIA Deep Learning Container License.
qanything-container-local | By pulling and using the container, you accept the terms and conditions of this license:
qanything-container-local | https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
qanything-container-local |
qanything-container-local | llm_api is set to [local]
qanything-container-local | device_id is set to [0]
qanything-container-local | runtime_backend is set to [hf]
qanything-container-local | model_name is set to [MiniChat-2-3B]
qanything-container-local | conv_template is set to [minichat]
qanything-container-local | tensor_parallel is set to [1]
qanything-container-local | gpu_memory_utilization is set to [0.81]
qanything-container-local | checksum 3299f3701e32a952d9d5769897bbb4b5
qanything-container-local | default_checksum 3299f3701e32a952d9d5769897bbb4b5
qanything-container-local |
qanything-container-local | [notice] A new release of pip is available: 23.3.2 -> 24.0
qanything-container-local | [notice] To update, run: python3 -m pip install --upgrade pip
qanything-container-local | GPU ID: 0, 0
qanything-container-local | GPU1 Model: NVIDIA GeForce RTX 4080 SUPER
qanything-container-local | Compute Capability: 8.9
qanything-container-local | OCR_USE_GPU=True because 8.9 >= 7.5
qanything-container-local | ====================================================
qanything-container-local | ******************** 重要提示 ********************
qanything-container-local | ====================================================
qanything-container-local |
qanything-container-local | 您当前的显存为 16376 MiB 推荐部署小于等于7B的大模型
qanything-container-local | tokens上限默认设置为4096
qanything-container-local | The triton server for embedding and reranker will start on 0 GPUs
qanything-container-local | Executing hf runtime_backend
qanything-container-local | The rerank service is ready! (2/8)
qanything-container-local | rerank服务已就绪! (2/8)
qanything-container-local | The ocr service is ready! (3/8)
qanything-container-local | OCR服务已就绪! (3/8)
qanything-container-local | Waiting for the backend service to start...
qanything-container-local | 等待启动后端服务
qanything-container-local | Waiting for the backend service to start...
qanything-container-local | 等待启动后端服务
qanything-container-local | The qanything backend service is ready! (4/8)
qanything-container-local | qanything后端服务已就绪! (4/8)
qanything-container-local | Dependencies related to npm are obtained. (5/8)
qanything-container-local | The front_end/dist folder already exists, no need to build the front end again.(6/8)
qanything-container-local | Waiting for the front-end service to start...
qanything-container-local | 等待启动前端服务
qanything-container-local |
qanything-container-local | > ai-demo@1.0.1 serve
qanything-container-local | > vite preview --port 5052
qanything-container-local |
qanything-container-local | The CJS build of Vite's Node API is deprecated. See https://vitejs.dev/guide/troubleshooting.html#vite-cjs-node-api-deprecated for more details.
qanything-container-local | ➜ Local: http://localhost:5052/qanything
qanything-container-local | ➜ Network: http://172.20.0.6:5052/qanything
qanything-container-local | The front-end service is ready!...(7/8)
qanything-container-local | 前端服务已就绪!...(7/8)
qanything-container-local | I0604 01:29:55.535924 129 grpc_server.cc:377] Thread started for CommonHandler
qanything-container-local | I0604 01:29:55.535953 129 infer_handler.cc:629] New request handler for ModelInferHandler, 0
qanything-container-local | I0604 01:29:55.535959 129 infer_handler.h:1025] Thread started for ModelInferHandler
qanything-container-local | I0604 01:29:55.535986 129 infer_handler.cc:629] New request handler for ModelInferHandler, 0
qanything-container-local | I0604 01:29:55.535991 129 infer_handler.h:1025] Thread started for ModelInferHandler
qanything-container-local | I0604 01:29:55.536022 129 stream_infer_handler.cc:122] New request handler for ModelStreamInferHandler, 0
qanything-container-local | I0604 01:29:55.536026 129 infer_handler.h:1025] Thread started for ModelStreamInferHandler
qanything-container-local | I0604 01:29:55.536028 129 grpc_server.cc:2450] Started GRPCInferenceService at 0.0.0.0:9001
qanything-container-local | I0604 01:29:55.536133 129 http_server.cc:3555] Started HTTPService at 0.0.0.0:9000
qanything-container-local | I0604 01:29:55.576918 129 http_server.cc:185] Started Metrics Service at 0.0.0.0:9002
qanything-container-local | I0604 01:30:10.435466 129 http_server.cc:3449] HTTP request: 0 /v2/health/ready
qanything-container-local | The embedding and rerank service is ready!. (7.5/8)
qanything-container-local | Embedding 和 Rerank 服务已准备就绪!(7.5/8)
qanything-container-local | You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
0%| | 0/1 [00:00<?, ?it/s]04 09:29:55 | ERROR | stderr |
100%|██████████| 1/1 [00:14<00:00, 14.91s/it]:10 | ERROR | stderr |
100%|██████████| 1/1 [00:14<00:00, 14.91s/it]:10 | ERROR | stderr |
qanything-container-local | 2024-06-04 09:30:10 | ERROR | stderr |
qanything-container-local | 2024-06-04 09:30:10 | INFO | model_worker | Register to controller
qanything-container-local | 2024-06-04 09:30:10 | ERROR | stderr | INFO: Started server process [144]
qanything-container-local | 2024-06-04 09:30:10 | ERROR | stderr | INFO: Waiting for application startup.
qanything-container-local | 2024-06-04 09:30:10 | ERROR | stderr | INFO: Application startup complete.
qanything-container-local | 2024-06-04 09:30:10 | ERROR | stderr | INFO: Uvicorn running on http://0.0.0.0:7801 (Press CTRL+C to quit)
qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current
qanything-container-local | Dload Upload Total Spent Left Speed
100 28 100 28 0 0 49557 0 --:--:-- --:--:-- --:--:-- 28000
qanything-container-local | The llm service is ready!, now you can use the qanything service. (8/8)
qanything-container-local | LLM 服务已准备就绪!现在您可以使用qanything服务。(8/8)
qanything-container-local | 开始检查日志文件中的错误信息...
qanything-container-local | /workspace/qanything_local/logs/debug_logs/rerank_server.log 中未检测到明确的错误信息。请手动排查 /workspace/qanything_local/logs/debug_logs/rerank_server.log 以获取更多信息。
qanything-container-local | /workspace/qanything_local/logs/debug_logs/ocr_server.log 中未检测到明确的错误信息。请手动排查 /workspace/qanything_local/logs/debug_logs/ocr_server.log 以获取更多信息。
qanything-container-local | /workspace/qanything_local/logs/debug_logs/sanic_api.log 中未检测到明确的错误信息。请手动排查 /workspace/qanything_local/logs/debug_logs/sanic_api.log 以获取更多信息。
qanything-container-local | Time elapsed: 17 seconds.
qanything-container-local | 已耗时: 17 秒.
qanything-container-local | Please visit the front-end service at [http://localhost:5052/qanything/] to conduct Q&A.
qanything-container-local | 请在[http://localhost:5052/qanything/]下访问前端服务来进行问答,如果前端报错,请在浏览器按F12以获取更多报错信息
是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?
当前行为 | Current Behavior
如图,对话时一直pending中
期望行为 | Expected Behavior
修复bug
运行环境 | Environment
QAnything日志 | QAnything logs
启动日志
sanic_api.log
debug.log
model-worker.log
复现方法 | Steps To Reproduce
No response
备注 | Anything else?
No response