netease-youdao / QAnything

Question and Answer based on Anything.
https://qanything.ai
GNU Affero General Public License v3.0
11.93k stars 1.16k forks source link

[BUG] 2.0无法启动成功,一直循环等待后端启动 #492

Open gu-feng opened 2 months ago

gu-feng commented 2 months ago

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

无法启动成功,main_server.log日志如下

期望行为 | Expected Behavior

正常启动

运行环境 | Environment

- OS:
- NVIDIA Driver:
- CUDA:
- docker:
- docker-compose:
- NVIDIA GPU:
- NVIDIA GPU Memory:

QAnything日志 | QAnything logs

UPLOAD_ROOT_PATH: /workspace/QAnything/QANY_DB/content IMAGES_ROOT_PATH: /workspace/QAnything/qanything_kernel/qanything_server/dist/qanything/assets/file_images <Logger debug_logger (INFO)> <Logger qa_logger (INFO)> <Logger rerank_logger (INFO)> <Logger embed_logger (INFO)> <Logger insert_logger (INFO)> /usr/local/lib/python3.10/site-packages/transformers/utils/generic.py:441: FutureWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. _torch_pytree._register_pytree_node( functions:[{'name': 'duckduckgo_search', 'description': 'duckduckgo_search(query: str, top_k: int) - Search infomation on internet. Useful for when the context can not answer the question. Input should be a search query.', 'parameters': {'type': 'object', 'properties': {'query': {'description': 'search query', 'type': 'string'}}, 'required': ['query']}}] [2024-09-01 10:54:41 +0000] [19] [INFO] Sanic v23.6.0 [2024-09-01 10:54:41 +0000] [19] [INFO] Goin' Fast @ http://0.0.0.0:8777 [2024-09-01 10:54:41 +0000] [19] [INFO] mode: production, single worker [2024-09-01 10:54:41 +0000] [19] [INFO] server: sanic, HTTP/1.1 [2024-09-01 10:54:41 +0000] [19] [INFO] python: 3.10.14 [2024-09-01 10:54:41 +0000] [19] [INFO] platform: Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.36 [2024-09-01 10:54:41 +0000] [19] [INFO] packages: sanic-routing==23.12.0, sanic-ext==23.6.0 /usr/local/lib/python3.10/site-packages/transformers/utils/generic.py:441: FutureWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. _torch_pytree._register_pytree_node( UPLOAD_ROOT_PATH: /workspace/QAnything/QANY_DB/content IMAGES_ROOT_PATH: /workspace/QAnything/qanything_kernel/qanything_server/dist/qanything/assets/file_images <Logger debug_logger (INFO)> <Logger qa_logger (INFO)> <Logger rerank_logger (INFO)> <Logger embed_logger (INFO)> <Logger insert_logger (INFO)> functions:[{'name': 'duckduckgo_search', 'description': 'duckduckgo_search(query: str, top_k: int) - Search infomation on internet. Useful for when the context can not answer the question. Input should be a search query.', 'parameters': {'type': 'object', 'properties': {'query': {'description': 'search query', 'type': 'string'}}, 'required': ['query']}}] /usr/local/lib/python3.10/site-packages/transformers/utils/generic.py:441: FutureWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. _torch_pytree._register_pytree_node( UPLOAD_ROOT_PATH: /workspace/QAnything/QANY_DB/content IMAGES_ROOT_PATH: /workspace/QAnything/qanything_kernel/qanything_server/dist/qanything/assets/file_images <Logger debug_logger (INFO)> <Logger qa_logger (INFO)> <Logger rerank_logger (INFO)> <Logger embed_logger (INFO)> <Logger insert_logger (INFO)> functions:[{'name': 'duckduckgo_search', 'description': 'duckduckgo_search(query: str, top_k: int) - Search infomation on internet. Useful for when the context can not answer the question. Input should be a search query.', 'parameters': {'type': 'object', 'properties': {'query': {'description': 'search query', 'type': 'string'}}, 'required': ['query']}}] [2024-09-01 10:56:17 +0000] [541] [INFO] Sanic Extensions: [2024-09-01 10:56:17 +0000] [541] [INFO] > injection [0 dependencies; 0 constants] [2024-09-01 10:56:17 +0000] [541] [INFO] > openapi [http://0.0.0.0:8777/docs] [2024-09-01 10:56:17 +0000] [541] [INFO] > http [2024-09-01 10:56:17 +0000] [541] [INFO] > templating [jinja2==3.1.4] Failed to create new connection using: 4089ba24827f47e7b2a0bb367c7d7356 [2024-09-01 10:56:28 +0000] [541] [ERROR] <MilvusException: (code=2, message=Fail connecting to server on host.docker.internal:19540. Timeout)> Traceback (most recent call last): File "/usr/local/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 143, in _wait_for_channel_ready grpc.channel_ready_future(self._channel).result(timeout=timeout) File "/usr/local/lib/python3.10/site-packages/grpc/_utilities.py", line 162, in result self._block(timeout) File "/usr/local/lib/python3.10/site-packages/grpc/_utilities.py", line 106, in _block raise grpc.FutureTimeoutError() grpc.FutureTimeoutError

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/usr/local/lib/python3.10/site-packages/sanic/worker/serve.py", line 117, in worker_serve return _serve_http_1( File "/usr/local/lib/python3.10/site-packages/sanic/server/runners.py", line 223, in _serve_http_1 loop.run_until_complete(app._server_event("init", "before")) File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete File "/usr/local/lib/python3.10/site-packages/sanic/app.py", line 1764, in _server_event await self.dispatch( File "/usr/local/lib/python3.10/site-packages/sanic/signals.py", line 208, in dispatch return await dispatch File "/usr/local/lib/python3.10/site-packages/sanic/signals.py", line 183, in _dispatch raise e File "/usr/local/lib/python3.10/site-packages/sanic/signals.py", line 167, in _dispatch retval = await maybe_coroutine File "/usr/local/lib/python3.10/site-packages/sanic/app.py", line 1315, in _listener await maybe_coro File "/workspace/QAnything/qanything_kernel/qanything_server/sanic_api.py", line 53, in init_local_doc_qa local_doc_qa.init_cfg(args) File "/workspace/QAnything/qanything_kernel/core/local_doc_qa.py", line 74, in init_cfg self.milvus_kb = VectorStoreMilvusClient() File "/workspace/QAnything/qanything_kernel/core/retriever/vectorstore.py", line 270, in init self.local_vectorstore: Milvus = SelfMilvus( File "/workspace/QAnything/qanything_kernel/core/retriever/vectorstore.py", line 16, in init super().init(*args, kwargs) File "/usr/local/lib/python3.10/site-packages/langchain_community/vectorstores/milvus.py", line 200, in init self.alias = self._create_connection_alias(connection_args) File "/usr/local/lib/python3.10/site-packages/langchain_community/vectorstores/milvus.py", line 283, in _create_connection_alias raise e File "/usr/local/lib/python3.10/site-packages/langchain_community/vectorstores/milvus.py", line 278, in _create_connection_alias connections.connect(alias=alias, connection_args) File "/usr/local/lib/python3.10/site-packages/pymilvus/orm/connections.py", line 414, in connect connect_milvus(**kwargs, user=user, password=password, token=token, db_name=db_name) File "/usr/local/lib/python3.10/site-packages/pymilvus/orm/connections.py", line 365, in connect_milvus gh._wait_for_channel_ready(timeout=timeout) File "/usr/local/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 146, in _wait_for_channel_ready raise MilvusException( pymilvus.exceptions.MilvusException: <MilvusException: (code=2, message=Fail connecting to server on host.docker.internal:19540. Timeout)> [2024-09-01 10:56:28 +0000] [19] [ERROR] Not all workers acknowledged a successful startup. Shutting down.

One of your worker processes terminated before startup was completed. Please solve any errors experienced during startup. If you do not see an exception traceback in your error logs, try running Sanic in in a single process using --single-process or single_process=True. Once you are confident that the server is able to start without errors you can switch back to multiprocess mode. [2024-09-01 10:56:28 +0000] [19] [INFO] Killing Sanic-Server-0-0 [541] [2024-09-01 10:56:28 +0000] [19] [INFO] Server Stopped

复现方法 | Steps To Reproduce

使用docker compose -f docker-compose-win.yaml up 启动

备注 | Anything else?

No response

gu-feng commented 2 months ago

UPLOAD_ROOT_PATH: /workspace/QAnything/QANY_DB/content IMAGES_ROOT_PATH: /workspace/QAnything/qanything_kernel/qanything_server/dist/qanything/assets/file_images <Logger debug_logger (INFO)> <Logger qa_logger (INFO)> <Logger rerank_logger (INFO)> <Logger embed_logger (INFO)> <Logger insert_logger (INFO)> /usr/local/lib/python3.10/site-packages/transformers/utils/generic.py:441: FutureWarning:torch.utils._pytree._register_pytree_nodeis deprecated. Please usetorch.utils._pytree.register_pytree_nodeinstead. _torch_pytree._register_pytree_node( Traceback (most recent call last): File "/usr/local/lib/python3.10/site-packages/transformers/utils/hub.py", line 389, in cached_file resolved_file = hf_hub_download( File "/usr/local/lib/python3.10/site-packages/huggingface_hub/utils/_deprecation.py", line 101, in inner_f return f(*args, **kwargs) File "/usr/local/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 106, in _inner_fn validate_repo_id(arg_value) File "/usr/local/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 154, in validate_repo_id raise HFValidationError( huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/workspace/QAnything/qanything_kernel/dependent_server/embedding_server/embedding_model_configs_v0.0.1'. Userepo_type` argument if needed.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/workspace/QAnything/qanything_kernel/qanything_server/sanic_api.py", line 18, in from handler import * File "/workspace/QAnything/qanything_kernel/qanything_server/handler.py", line 2, in from qanything_kernel.core.local_doc_qa import LocalDocQA File "/workspace/QAnything/qanything_kernel/core/local_doc_qa.py", line 9, in from qanything_kernel.connector.embedding.embedding_for_online_client import YouDaoEmbeddings File "/workspace/QAnything/qanything_kernel/connector/embedding/embedding_for_online_client.py", line 4, in from qanything_kernel.utils.general_utils import get_time_async, get_time File "/workspace/QAnything/qanything_kernel/utils/general_utils.py", line 231, in embedding_tokenizer = AutoTokenizer.from_pretrained(LOCAL_EMBED_PATH, local_files_only=True) File "/usr/local/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 737, in from_pretrained tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs) File "/usr/local/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 569, in get_tokenizer_config resolved_config_file = cached_file( File "/usr/local/lib/python3.10/site-packages/transformers/utils/hub.py", line 454, in cached_file raise EnvironmentError( OSError: Incorrect path_or_model_id: '/workspace/QAnything/qanything_kernel/dependent_server/embedding_server/embedding_model_configs_v0.0.1'. Please provide either the path to a local folder or the repo_id of a model on the Hub.`

netease-youdao commented 2 months ago

你的启动方式是什么?新版本的启动方式必须依赖docker-compose,无法单独python启动,你这个问题就是缺了docker镜像内置的模型 @gu-feng ,启动方式:https://github.com/netease-youdao/QAnything/tree/qanything-v2?tab=readme-ov-file#step2-enter-the-project-root-directory-and-execute-the-startup-command ,也可以自己构建镜像,Dockefile已更新:https://github.com/netease-youdao/QAnything/blob/qanything-v2/build_images/Dockerfile

gu-feng commented 2 months ago

你的启动方式是什么?新版本的启动方式必须依赖docker-compose,无法单独启动python,你这个问题就是缺了docker内置的模型@gu-feng,启动方式:https://github.com/netease-youdao/QAnything/tree/qanything-v2?tab=readme-ov-file#step2-enter-the-project-root-directory-and-execute-the- startup-command,也可以自己构建镜像,Dockefile已更新:https://github.com/netease-youdao/QAnything/blob/qanything-v2/build_images/Dockerfile

使用docker compose -f docker-compose-win.yaml up 命令部署启动的,docker-compose命令也试过,docker desktop版本是4.34.0,未启用WSL环境@netease-youdao

liangDYL commented 2 months ago

我也遇到了这个问题,使用docker-compose 启动就是elasticsearch 不健康,过一会就自己挂了。使用docker compose启动,就一直等待后端启动,过一会,qanything-local就会挂掉

fjzuser commented 2 months ago

之前就提交过这个问题,1.5.1的镜像版本,这么久了,无论是win还是wsl,都无法正常启动的。没搞懂开发人员就没有按文档安装测试下吗

liangDYL commented 2 months ago

之前就提交过这个问题,1.5.1的镜像版本,这么久了,无论是win还是wsl,都无法正常启动的。没搞懂开发人员就没有按文档安装测试下吗

我用的是Mac 19版,我在两台电脑上尝试安装,都是一样的错误,一直等待后端服务启动

foxworld306 commented 2 months ago

遇到了同样的错误,Win11+WSL2,无论用linux还是win,docker-compose 或 docker compose 都是同样的现象

jujulike commented 2 months ago

我也遇到了一样的问题

foxworld306 commented 2 months ago

遇到了同样的错误,Win11+WSL2 (纯CPU,无GPU),无论用linux还是win,docker-compose 或 docker compose 都是同样的现象

如果这有帮助的话,以下是我的环境里main_server.log中的内容:

UPLOAD_ROOT_PATH: /workspace/QAnything/QANY_DB/content IMAGES_ROOT_PATH: /workspace/QAnything/qanything_kernel/qanything_server/dist/qanything/assets/file_images <Logger debug_logger (INFO)> <Logger qa_logger (INFO)> <Logger rerank_logger (INFO)> <Logger embed_logger (INFO)> <Logger insert_logger (INFO)> /usr/local/lib/python3.10/site-packages/transformers/utils/generic.py:441: FutureWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. _torch_pytree._register_pytree_node( functions:[{'name': 'duckduckgo_search', 'description': 'duckduckgo_search(query: str, top_k: int) - Search infomation on internet. Useful for when the context can not answer the question. Input should be a search query.', 'parameters': {'type': 'object', 'properties': {'query': {'description': 'search query', 'type': 'string'}}, 'required': ['query']}}] [2024-09-05 02:35:31 +0000] [13] [INFO] Sanic v23.6.0 [2024-09-05 02:35:31 +0000] [13] [INFO] Goin' Fast @ http://0.0.0.0:8777 [2024-09-05 02:35:31 +0000] [13] [INFO] mode: production, single worker [2024-09-05 02:35:31 +0000] [13] [INFO] server: sanic, HTTP/1.1 [2024-09-05 02:35:31 +0000] [13] [INFO] python: 3.10.14 [2024-09-05 02:35:31 +0000] [13] [INFO] platform: Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.36 [2024-09-05 02:35:31 +0000] [13] [INFO] packages: sanic-routing==23.12.0, sanic-ext==23.6.0 /usr/local/lib/python3.10/site-packages/transformers/utils/generic.py:441: FutureWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. _torch_pytree._register_pytree_node(

liangDYL commented 2 months ago

之前就提交过这个问题,1.5.1的镜像版本,这么久了,无论是win还是wsl,都无法正常启动的。没搞懂开发人员就没有按文档安装测试下吗

我用的是Mac 19版,我在两台电脑上尝试安装,都是一样的错误,一直等待后端服务启动

今天下午启动成功了一次,entrypoing.sh里面的启动时间太短了,设成无限长就能启动了,但是还是会有端口映射冲突,还需要改一个端口。启动时间巨长。https://github.com/netease-youdao/QAnything/issues/474。 这俩issue应该是同一个问题

m00nLi commented 2 months ago

这就2.0了

newsyh commented 2 months ago

设备:mac 芯片:M2 启动失败,一直提示 2024-09-06 11:23:20 Waiting for the backend service to start... 2024-09-06 11:23:20 等待启动后端服务 最终启动失败: [2024-09-06 03:23:52 +0000] [291] [INFO] Sanic Extensions: 2024-09-06 11:26:16 [2024-09-06 03:23:52 +0000] [291] [INFO] > injection [0 dependencies; 0 constants] 2024-09-06 11:26:16 [2024-09-06 03:23:52 +0000] [291] [INFO] > openapi [http://0.0.0.0:8777/docs] 2024-09-06 11:26:16 [2024-09-06 03:23:52 +0000] [291] [INFO] > http 2024-09-06 11:26:16 [2024-09-06 03:23:52 +0000] [291] [INFO] > templating [jinja2==3.1.4] 2024-09-06 11:26:16 [2024-09-06 03:23:55 +0000] [291] [ERROR] Connection error caused by: ConnectionError(Connection error caused by: ProtocolError(('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')))) 2024-09-06 11:26:16 Traceback (most recent call last): 2024-09-06 11:26:16 File "/usr/local/lib/python3.10/site-packages/sanic/worker/serve.py", line 117, in worker_serve 2024-09-06 11:26:16 return _serve_http_1( 2024-09-06 11:26:16 File "/usr/local/lib/python3.10/site-packages/sanic/server/runners.py", line 223, in _serve_http_1 2024-09-06 11:26:16 loop.run_until_complete(app._server_event("init", "before")) 2024-09-06 11:26:16 -- 2024-09-06 11:26:16 meta, resp_body = self.transport.perform_request( 2024-09-06 11:26:16 File "/usr/local/lib/python3.10/site-packages/elastic_transport/_transport.py", line 342, in perform_request 2024-09-06 11:26:16 resp = node.perform_request( 2024-09-06 11:26:16 File "/usr/local/lib/python3.10/site-packages/elastic_transport/_node/_http_urllib3.py", line 202, in perform_request 2024-09-06 11:26:16 raise err from None 2024-09-06 11:26:16 elastic_transport.ConnectionError: Connection error caused by: ConnectionError(Connection error caused by: ProtocolError(('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')))) 2024-09-06 11:26:16 [2024-09-06 03:23:55 +0000] [13] [ERROR] Not all workers acknowledged a successful startup. Shutting down. 2024-09-06 11:26:16 2024-09-06 11:26:16 One of your worker processes terminated before startup was completed. Please solve any errors experienced during startup. If you do not see an exception traceback in your error logs, try running Sanic in in a single process using --single-process or single_process=True. Once you are confident that the server is able to start without errors you can switch back to multiprocess mode. 2024-09-06 11:26:16 [2024-09-06 03:23:55 +0000] [13] [INFO] Killing Sanic-Server-0-0 [291] 2024-09-06 11:26:16 [2024-09-06 03:23:55 +0000] [13] [INFO] Server Stopped 2024-09-06 11:26:16 检测到错误信息,请查看上面的输出。

请问是什么原因

fjzuser commented 2 months ago

用了百度的镜像,win版本基本能启动了,但是无法解析上传的文件,也就基本不能用;linux版本依旧无法启动。

hd19820806 commented 2 months ago

一样一样的 无法启动 等待后端

x0620x commented 2 months ago

微信截图_20240907193342 同样的问题 ,等待启动后端服务

sea-007 commented 2 months ago

UPLOAD_ROOT_PATH: /workspace/QAnything/QANY_DB/content IMAGES_ROOT_PATH: /workspace/QAnything/qanything_kernel/qanything_server/dist/qanything/assets/file_images <Logger debug_logger (INFO)> <Logger qa_logger (INFO)> <Logger rerank_logger (INFO)> <Logger embed_logger (INFO)> <Logger insert_logger (INFO)> /usr/local/lib/python3.10/site-packages/transformers/utils/generic.py:441: FutureWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. _torch_pytree._register_pytree_node( functions:[{'name': 'duckduckgo_search', 'description': 'duckduckgo_search(query: str, top_k: int) - Search infomation on internet. Useful for when the context can not answer the question. Input should be a search query.', 'parameters': {'type': 'object', 'properties': {'query': {'description': 'search query', 'type': 'string'}}, 'required': ['query']}}] [2024-09-08 04:38:08 +0000] [13] [INFO] Sanic v23.6.0 [2024-09-08 04:38:08 +0000] [13] [INFO] Goin' Fast @ http://0.0.0.0:8777 [2024-09-08 04:38:08 +0000] [13] [INFO] mode: production, single worker [2024-09-08 04:38:08 +0000] [13] [INFO] server: sanic, HTTP/1.1 [2024-09-08 04:38:08 +0000] [13] [INFO] python: 3.10.14 [2024-09-08 04:38:08 +0000] [13] [INFO] platform: Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.36

[2024-09-08 04:38:08 +0000] [13] [INFO] packages: sanic-routing==23.12.0, sanic-ext==23.6.0 /usr/local/lib/python3.10/site-packages/transformers/utils/generic.py:441: FutureWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. _torch_pytree._register_pytree_node( : win11+wsl2 无法启动。。。。

newsyh commented 2 months ago

怎么没有官方的回复呢?出现问题的很多啊

gu-feng commented 2 months ago

@netease-youdao,@xixihahaliu

sunguangran commented 2 months ago

+1

jacktpy commented 2 months ago

+1 希望能尽快解决。。

hxdtest commented 2 months ago

mac 环境 修改 image

dd404x commented 2 months ago

按照楼上修改不行 image

l0g1n commented 2 months ago

同样的错误,es能正常启动起来,修改等待时间也没有用。M2

gu-feng commented 2 months ago

找到问题解决了,内存不足造成服务无法启动,调大内存就启动成功了,一切运行正常

foxworld306 commented 2 months ago

找到问题解决了,内存不足造成服务无法启动,调大内存就启动成功了,一切运行正常

需要增加物理内存?加到多大?

Alreadtstart commented 3 weeks ago

找到问题解决了,内存不足造成服务无法启动,调大内存就启动成功了,一切运行正常

同样的问题,不是说缺少内置模型吗?怎么又去调大内存了

Alreadtstart commented 3 weeks ago

你的启动方式是什么?新版本的启动方式必须依赖docker-compose,无法单独python启动,你这个问题就是缺了docker镜像内置的模型 @gu-feng ,启动方式:https://github.com/netease-youdao/QAnything/tree/qanything-v2?tab=readme-ov-file#step2-enter-the-project-root-directory-and-execute-the-startup-command ,也可以自己构建镜像,Dockefile已更新:https://github.com/netease-youdao/QAnything/blob/qanything-v2/build_images/Dockerfile

怎么解决这个缺失docker的内置模型?我重新构建了arm镜像,models中有内置的下载的模型rank,embed、base的文件