[BUG] 对话和操作知识库接口无响应，一直pending

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

[X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

[X] 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

如图，对话时一直pending中

期望行为 | Expected Behavior

修复bug

运行环境 | Environment

- OS:Ubuntu 22.04 
- NVIDIA Driver:525.105.17
- CUDA:Cuda compilation tools, release 12.1, V12.1.105
- docker: 25.03
- docker-compose: 24.0.5
- NVIDIA GPU:GTX 4080 Super
- NVIDIA GPU Memory:16GB

QAnything日志 | QAnything logs

启动日志

请输入您使用的大模型B数(示例：1.8B/3B/7B): 3B
model_size=3B
GPUID1=0, GPUID2=0, device_id=0
llm_api is set to [local]
device_id is set to [0]
runtime_backend is set to [hf]
model_name is set to [MiniChat-2-3B]
conv_template is set to [minichat]
tensor_parallel is set to [1]
gpu_memory_utilization is set to [0.81]
Do you want to use the previous ip: localhost? (yes/no) 是否使用上次的ip: ？(yes/no) 回车默认选yes，请输入:
Running under native Linux
[+] Running 5/6
 ⠼ Network qanything_milvus_mysql_local  Created                                                                                                                                                              1.5s 
 ✔ Container milvus-minio-local          Started                                                                                                                                                              0.6s 
 ✔ Container mysql-container-local       Started                                                                                                                                                              0.8s 
 ✔ Container milvus-etcd-local           Started                                                                                                                                                              0.8s 
 ✔ Container milvus-standalone-local     Started                                                                                                                                                              0.9s 
 ✔ Container qanything-container-local   Started                                                                                                                                                              1.2s 
qanything-container-local  | 
qanything-container-local  | =============================
qanything-container-local  | == Triton Inference Server ==
qanything-container-local  | =============================
qanything-container-local  | 
qanything-container-local  | NVIDIA Release 23.05 (build 61161506)
qanything-container-local  | Triton Server Version 2.34.0
qanything-container-local  | 
qanything-container-local  | Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
qanything-container-local  | 
qanything-container-local  | Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
qanything-container-local  | 
qanything-container-local  | This container image and its contents are governed by the NVIDIA Deep Learning Container License.
qanything-container-local  | By pulling and using the container, you accept the terms and conditions of this license:
qanything-container-local  | https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
qanything-container-local  | 
qanything-container-local  | llm_api is set to [local]
qanything-container-local  | device_id is set to [0]
qanything-container-local  | runtime_backend is set to [hf]
qanything-container-local  | model_name is set to [MiniChat-2-3B]
qanything-container-local  | conv_template is set to [minichat]
qanything-container-local  | tensor_parallel is set to [1]
qanything-container-local  | gpu_memory_utilization is set to [0.81]
qanything-container-local  | checksum 3299f3701e32a952d9d5769897bbb4b5
qanything-container-local  | default_checksum 3299f3701e32a952d9d5769897bbb4b5
qanything-container-local  | 
qanything-container-local  | [notice] A new release of pip is available: 23.3.2 -> 24.0
qanything-container-local  | [notice] To update, run: python3 -m pip install --upgrade pip
qanything-container-local  | GPU ID: 0, 0
qanything-container-local  | GPU1 Model: NVIDIA GeForce RTX 4080 SUPER
qanything-container-local  | Compute Capability: 8.9
qanything-container-local  | OCR_USE_GPU=True because 8.9 >= 7.5
qanything-container-local  | ====================================================
qanything-container-local  | ******************** 重要提示 ********************
qanything-container-local  | ====================================================
qanything-container-local  | 
qanything-container-local  | 您当前的显存为 16376 MiB 推荐部署小于等于7B的大模型
qanything-container-local  | tokens上限默认设置为4096
qanything-container-local  | The triton server for embedding and reranker will start on 0 GPUs
qanything-container-local  | Executing hf runtime_backend
qanything-container-local  | The rerank service is ready! (2/8)
qanything-container-local  | rerank服务已就绪! (2/8)
qanything-container-local  | The ocr service is ready! (3/8)
qanything-container-local  | OCR服务已就绪! (3/8)
qanything-container-local  | Waiting for the backend service to start...
qanything-container-local  | 等待启动后端服务
qanything-container-local  | Waiting for the backend service to start...
qanything-container-local  | 等待启动后端服务
qanything-container-local  | The qanything backend service is ready! (4/8)
qanything-container-local  | qanything后端服务已就绪! (4/8)
qanything-container-local  | Dependencies related to npm are obtained. (5/8)
qanything-container-local  | The front_end/dist folder already exists, no need to build the front end again.(6/8)
qanything-container-local  | Waiting for the front-end service to start...
qanything-container-local  | 等待启动前端服务
qanything-container-local  | 
qanything-container-local  | > ai-demo@1.0.1 serve
qanything-container-local  | > vite preview --port 5052
qanything-container-local  | 
qanything-container-local  | The CJS build of Vite's Node API is deprecated. See https://vitejs.dev/guide/troubleshooting.html#vite-cjs-node-api-deprecated for more details.
qanything-container-local  |   ➜  Local:   http://localhost:5052/qanything
qanything-container-local  |   ➜  Network: http://172.20.0.6:5052/qanything
qanything-container-local  | The front-end service is ready!...(7/8)
qanything-container-local  | 前端服务已就绪!...(7/8)
qanything-container-local  | I0604 01:29:55.535924 129 grpc_server.cc:377] Thread started for CommonHandler
qanything-container-local  | I0604 01:29:55.535953 129 infer_handler.cc:629] New request handler for ModelInferHandler, 0
qanything-container-local  | I0604 01:29:55.535959 129 infer_handler.h:1025] Thread started for ModelInferHandler
qanything-container-local  | I0604 01:29:55.535986 129 infer_handler.cc:629] New request handler for ModelInferHandler, 0
qanything-container-local  | I0604 01:29:55.535991 129 infer_handler.h:1025] Thread started for ModelInferHandler
qanything-container-local  | I0604 01:29:55.536022 129 stream_infer_handler.cc:122] New request handler for ModelStreamInferHandler, 0
qanything-container-local  | I0604 01:29:55.536026 129 infer_handler.h:1025] Thread started for ModelStreamInferHandler
qanything-container-local  | I0604 01:29:55.536028 129 grpc_server.cc:2450] Started GRPCInferenceService at 0.0.0.0:9001
qanything-container-local  | I0604 01:29:55.536133 129 http_server.cc:3555] Started HTTPService at 0.0.0.0:9000
qanything-container-local  | I0604 01:29:55.576918 129 http_server.cc:185] Started Metrics Service at 0.0.0.0:9002
qanything-container-local  | I0604 01:30:10.435466 129 http_server.cc:3449] HTTP request: 0 /v2/health/ready
qanything-container-local  | The embedding and rerank service is ready!. (7.5/8)
qanything-container-local  | Embedding 和 Rerank 服务已准备就绪！(7.5/8)
qanything-container-local  | You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
  0%|          | 0/1 [00:00<?, ?it/s]04 09:29:55 | ERROR | stderr | 
100%|██████████| 1/1 [00:14<00:00, 14.91s/it]:10 | ERROR | stderr | 
100%|██████████| 1/1 [00:14<00:00, 14.91s/it]:10 | ERROR | stderr | 
qanything-container-local  | 2024-06-04 09:30:10 | ERROR | stderr | 
qanything-container-local  | 2024-06-04 09:30:10 | INFO | model_worker | Register to controller
qanything-container-local  | 2024-06-04 09:30:10 | ERROR | stderr | INFO:     Started server process [144]
qanything-container-local  | 2024-06-04 09:30:10 | ERROR | stderr | INFO:     Waiting for application startup.
qanything-container-local  | 2024-06-04 09:30:10 | ERROR | stderr | INFO:     Application startup complete.
qanything-container-local  | 2024-06-04 09:30:10 | ERROR | stderr | INFO:     Uvicorn running on http://0.0.0.0:7801 (Press CTRL+C to quit)
qanything-container-local  |   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
qanything-container-local  |                                  Dload  Upload   Total   Spent    Left  Speed
100    28  100    28    0     0  49557      0 --:--:-- --:--:-- --:--:-- 28000
qanything-container-local  | The llm service is ready!, now you can use the qanything service. (8/8)
qanything-container-local  | LLM 服务已准备就绪！现在您可以使用qanything服务。（8/8)
qanything-container-local  | 开始检查日志文件中的错误信息...
qanything-container-local  | /workspace/qanything_local/logs/debug_logs/rerank_server.log 中未检测到明确的错误信息。请手动排查 /workspace/qanything_local/logs/debug_logs/rerank_server.log 以获取更多信息。
qanything-container-local  | /workspace/qanything_local/logs/debug_logs/ocr_server.log 中未检测到明确的错误信息。请手动排查 /workspace/qanything_local/logs/debug_logs/ocr_server.log 以获取更多信息。
qanything-container-local  | /workspace/qanything_local/logs/debug_logs/sanic_api.log 中未检测到明确的错误信息。请手动排查 /workspace/qanything_local/logs/debug_logs/sanic_api.log 以获取更多信息。
qanything-container-local  | Time elapsed: 17 seconds.
qanything-container-local  | 已耗时: 17 秒.
qanything-container-local  | Please visit the front-end service at [http://localhost:5052/qanything/] to conduct Q&A.
qanything-container-local  | 请在[http://localhost:5052/qanything/]下访问前端服务来进行问答，如果前端报错，请在浏览器按F12以获取更多报错信息

sanic_api.log

INFO:debug_logger:history: [] 
INFO:debug_logger:question: 测试
INFO:debug_logger:kb_ids: ['KB2027ce29d6384fc5a6bb4a7bfc59c9ff']
INFO:debug_logger:user_id: zzp
INFO:debug_logger:check_kb_exist [('KB2027ce29d6384fc5a6bb4a7bfc59c9ff',)]
INFO:debug_logger:collection zzp exists
INFO:debug_logger:partitions: ['KB2027ce29d6384fc5a6bb4a7bfc59c9ff']
INFO:debug_logger:list_docs zzp
INFO:debug_logger:kb_id: KBb9a5b4d0871c48bb8338df488ddd2054
INFO:debug_logger:list_docs zzp
INFO:debug_logger:kb_id: KB2be1fe6448384b81bd8872821917dfe0
INFO:debug_logger:list_docs zzp

debug.log

2024-06-04 09:47:45,729 - [PID: 886][Sanic-Server-3-0] - [Function: local_doc_chat] - INFO - rerank True
2024-06-04 09:47:45,729 - [PID: 886][Sanic-Server-3-0] - [Function: local_doc_chat] - INFO - history: [] 
2024-06-04 09:47:45,729 - [PID: 886][Sanic-Server-3-0] - [Function: local_doc_chat] - INFO - question: 测试
2024-06-04 09:47:45,729 - [PID: 886][Sanic-Server-3-0] - [Function: local_doc_chat] - INFO - kb_ids: ['KB2027ce29d6384fc5a6bb4a7bfc59c9ff']
2024-06-04 09:47:45,730 - [PID: 886][Sanic-Server-3-0] - [Function: local_doc_chat] - INFO - user_id: zzp
2024-06-04 09:47:45,731 - [PID: 886][Sanic-Server-3-0] - [Function: check_kb_exist] - INFO - check_kb_exist [('KB2027ce29d6384fc5a6bb4a7bfc59c9ff',)]
2024-06-04 09:47:45,755 - [PID: 886][Sanic-Server-3-0] - [Function: init] - INFO - collection zzp exists
2024-06-04 09:47:45,757 - [PID: 886][Sanic-Server-3-0] - [Function: init] - INFO - partitions: ['KB2027ce29d6384fc5a6bb4a7bfc59c9ff']
2024-06-04 09:48:27,009 - [PID: 884][Sanic-Server-1-0] - [Function: list_docs] - INFO - list_docs zzp
2024-06-04 09:48:27,010 - [PID: 884][Sanic-Server-1-0] - [Function: list_docs] - INFO - kb_id: KBb9a5b4d0871c48bb8338df488ddd2054
2024-06-04 09:48:27,108 - [PID: 884][Sanic-Server-1-0] - [Function: list_docs] - INFO - list_docs zzp

model-worker.log

024-06-04 09:29:54 | INFO | model_worker | args: Namespace(host='0.0.0.0', port=7801, worker_address='http://0.0.0.0:7801', controller_address='http://0.0.0.0:7800', model_path='/model_repos/CustomLLM/MiniChat-2-3B', revision='main', device='cuda', gpus='0', num_gpus=1, max_gpu_memory=None, dtype='bfloat16', load_8bit=True, cpu_offloading=False, gptq_ckpt=None, gptq_wbits=16, gptq_groupsize=-1, gptq_act_order=False, awq_ckpt=None, awq_wbits=16, awq_groupsize=-1, enable_exllama=False, exllama_max_seq_len=4096, exllama_gpu_split=None, exllama_cache_8bit=False, enable_xft=False, xft_max_seq_len=4096, xft_dtype=None, model_names=None, conv_template='minichat', embed_in_truncate=False, limit_worker_concurrency=5, stream_interval=2, no_register=False, seed=None, debug=False, ssl=False)
2024-06-04 09:29:54 | INFO | model_worker | Loading the model ['MiniChat-2-3B'] on worker 3f33aa48 ...
2024-06-04 09:29:55 | ERROR | stderr | 
  0%|          | 0/1 [00:00<?, ?it/s]
2024-06-04 09:30:10 | ERROR | stderr | 
100%|██████████| 1/1 [00:14<00:00, 14.91s/it]
2024-06-04 09:30:10 | ERROR | stderr | 
100%|██████████| 1/1 [00:14<00:00, 14.91s/it]
2024-06-04 09:30:10 | ERROR | stderr |

复现方法 | Steps To Reproduce

No response

备注 | Anything else?

No response

netease-youdao / QAnything

[BUG] 对话和操作知识库接口无响应，一直pending #379

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

期望行为 | Expected Behavior

运行环境 | Environment

QAnything日志 | QAnything logs

启动日志

sanic_api.log

debug.log

model-worker.log

复现方法 | Steps To Reproduce

备注 | Anything else?