netease-youdao / QAnything

Question and Answer based on Anything.
https://qanything.ai
GNU Affero General Public License v3.0
11.79k stars 1.14k forks source link

Triton服务启动超时,在models里面有个日志文件QAEnsemble.log #75

Open Wimet7 opened 9 months ago

Wimet7 commented 9 months ago

最后的日志显示: qanything-container-local | Triton服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :) qanything-container-local | The triton service is starting up, it can be long... you have time to make a coffee :) qanything-container-local | Triton服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :) qanything-container-local | 启动Triton服务超时,请进入容器内检查/model_repos/QAEnsemble_base/QAEnsemble_base.log以获取更多信息。 qanything-container-local exited with code 1

1.运行过程中我在容器内这个/model_repos/QAEnsemble_base/目录下没有找到QAEnsemble_base.log文件。 2.在models里面日志文件QAEnsemble.log看到报错为:

I0126 01:57:31.730858 97 server.cc:283] Waiting for in-flight requests to complete. I0126 01:57:31.730867 97 server.cc:299] Timeout 30: Found 0 model versions that have in-flight inferences I0126 01:57:31.730963 97 server.cc:314] All models are stopped, unloading models I0126 01:57:31.730978 97 server.cc:321] Timeout 30: Found 2 live models and 0 in-flight non-inference requests I0126 01:57:31.730987 97 server.cc:328] embed v1: UNLOADING I0126 01:57:31.730994 97 server.cc:328] rerank v1: UNLOADING I0126 01:57:31.731111 97 backend_model_instance.cc:823] Stopping backend thread for rerank... I0126 01:57:31.731180 97 backend_model_instance.cc:823] Stopping backend thread for embed... I0126 01:57:31.731207 97 onnxruntime.cc:2685] TRITONBACKEND_ModelInstanceFinalize: delete instance state I0126 01:57:31.731238 97 onnxruntime.cc:2685] TRITONBACKEND_ModelInstanceFinalize: delete instance state I0126 01:57:31.755773 97 onnxruntime.cc:2631] TRITONBACKEND_ModelFinalize: delete model state I0126 01:57:31.756362 97 model_lifecycle.cc:608] successfully unloaded 'embed' version 1 I0126 01:57:31.759827 97 onnxruntime.cc:2631] TRITONBACKEND_ModelFinalize: delete model state I0126 01:57:31.761468 97 model_lifecycle.cc:608] successfully unloaded 'rerank' version 1 I0126 01:57:32.731150 97 server.cc:321] Timeout 29: Found 0 live models and 0 in-flight non-inference requests I0126 01:57:32.731325 97 backend_manager.cc:137] unloading backend 'onnxruntime' error: creating server: Internal - failed to load all models

麻烦请帮忙看下什么情况

Zhu-811 commented 9 months ago

同样的问题,我在issues找到类似问题但都没有解决方案,社区讨论似乎是显卡问题,我用的p40,请问你的显卡是什么型号

Wimet7 commented 9 months ago

同样的问题,我在issues找到类似问题但都没有解决方案,社区讨论似乎是显卡问题,我用的p40,请问你的显卡是什么型号

我是4090

Zhu-811 commented 9 months ago

同样的问题,我在issues找到类似问题但都没有解决方案,社区讨论似乎是显卡问题,我用的p40,请问你的显卡是什么型号

我是4090

7dcef1de7d859a1ac2845dc2cfde536 显卡不支持

huweibin1983 commented 9 months ago

我更新到最新版本可以正常启动了,之气那跟你的现象一样。

Wimet7 commented 9 months ago

我更新到最新版本可以正常启动了,之气那跟你的现象一样。

我是昨天拉的master的代码,按理说应该是最新的,但是我docker上下载显示的版本是1.0.9,你现在是这个版本么?

huweibin1983 commented 9 months ago

docker上版本1.0.9,code版本是最新的。

wuwq commented 9 months ago

code是最新的,docker镜像版本是1.1.1,显卡是GP100GL,出现了同样的错误

alibabadoufu commented 9 months ago

我也遇到了类似的问题。我用的v100的卡。请问有解决方案吗?