netease-youdao / QAnything

Question and Answer based on Anything.
https://qanything.ai
Apache License 2.0
10.8k stars 1.04k forks source link

[BUG] milvus插入失败,请稍后再试 #65

Open Kushizu-Kobuchi opened 6 months ago

Kushizu-Kobuchi commented 6 months ago

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

我上传了若干文件,包含pdf、csv、md、xlsx、pptx、txt等格式,其中有部分markdown和全部的xlsx提示“milvus插入失败,请稍后再试”。

2024-01-24 10:53:20,174 - unstructured - INFO - Reading document from string ... 2024-01-24 10:53:20,175 - unstructured - INFO - Reading document ... 2024-01-24 10:53:20,176 - unstructured - INFO - HTML element instance has no attribute type 2024-01-24 10:53:20,203 - root - INFO - before 2nd split doc lens: 12 2024-01-24 10:53:20,213 - root - INFO - after 2nd split doc lens: 19 2024-01-24 10:53:20,213 - root - INFO - langchain analysis content head: 大模型算法与原理 2024-01-24 10:53:20,227 - root - INFO - split time: 0.061860084533691406 19 2024-01-24 10:53:20,344 - root - INFO - embedding time: 0.1165318489074707 19 2024-01-24 10:53:20,360 - root - INFO - now inser_file 大模型算法与原理.md 2024-01-24 10:53:20,360 - root - INFO - Inserting into Milvus... 2024-01-24 10:53:20,363 - pymilvus.decorators - ERROR - RPC error: [batch_insert], <MilvusException: (code=1100, message=the length (2103) of 6th string exceeds max length (2000): invalid parameter[expected=valid length string][actual=string length exceeds max length])>, <Time:{'RPC start': '2024-01-24 10:53:20.361429', 'RPC error': '2024-01-24 10:53:20.363877'}> 2024-01-24 10:53:20,364 - root - INFO - Milvus insert file_id:ab9b6faa92c8411dab8b8ebee861b94e, file_name:大模型算法与原理.md failed: <MilvusException: (code=1100, message=the length (2103) of 6th string exceeds max length (2000): invalid parameter[expected=valid length string][actual=string length exceeds max length])> 2024-01-24 10:53:20,364 - root - INFO - insert time: 0.01981949806213379 2024-01-24 10:53:20,384 - root - INFO - insert_to_milvus: success num: 0, failed num: 1 2024-01-24 10:53:21,519 - root - INFO - list_docs zzp

期望行为 | Expected Behavior

No response

运行环境 | Environment

- OS: Ubuntu 20.04
- NVIDIA Driver:535.14.02
- CUDA:12.2
- Docker Compose:2.24.2
- NVIDIA GPU Memory: 24G

QAnything日志 | QAnything logs

No response

复现方法 | Steps To Reproduce

No response

备注 | Anything else?

No response

shuracwf commented 5 months ago

借楼同问,看到最新的更新把/qanything_kernel/connector/database/milvus_client.py文件中的fields中的content的max_length从2000加到4000,还是会报同样的错误。

xixihahaliu commented 5 months ago

Solution:After pulling the latest code (optimized the split logic), delete the "volumes" folder under the project root directory (requires sudo permission, you can enter the container to delete it), and then recreate the knowledge base. The reason is that currently, the front end defaults to user_id=zzp, while Milvus configuration is bound to usernames. This means that even if you modify the code, existing users' Milvus configurations will not be updated. You need to manually delete all data and reconfigure for changes to take effect. Of course, you can also manually change the "userId" parameter in "front_end/src/services/urlConfig.tsfig.ts" file to another value.

Wangxunhang commented 3 months ago

Solution:After pulling the latest code (optimized the split logic), delete the "volumes" folder under the project root directory (requires sudo permission, you can enter the container to delete it), and then recreate the knowledge base. The reason is that currently, the front end defaults to user_id=zzp, while Milvus configuration is bound to usernames. This means that even if you modify the code, existing users' Milvus configurations will not be updated. You need to manually delete all data and reconfigure for changes to take effect. Of course, you can also manually change the "userId" parameter in "front_end/src/services/urlConfig.tsfig.ts" file to another value.

你好, 想请教一下我在处理excel文件时, 最开始报错string length(49960)超出max_length(4000), 我将max_length改成了50000, 结果又有以下报错: ERROR - Milvus insert file_id:28cc78570c5b4693ab395b4dad26bed9, file_name:123.xlsx failed: <MilvusException: (code=1100, message=the length (128797) of 0th string exceeds max length (50000): invalid parameter[expected=valid length string][actual=string length exceeds max length])> 同样的内容放到word文档里不会报错, 目前只有excel文件出现这个问题, 麻烦请教下.

fire717 commented 4 weeks ago

Solution:After pulling the latest code (optimized the split logic), delete the "volumes" folder under the project root directory (requires sudo permission, you can enter the container to delete it), and then recreate the knowledge base. The reason is that currently, the front end defaults to user_id=zzp, while Milvus configuration is bound to usernames. This means that even if you modify the code, existing users' Milvus configurations will not be updated. You need to manually delete all data and reconfigure for changes to take effect. Of course, you can also manually change the "userId" parameter in "front_end/src/services/urlConfig.tsfig.ts" file to another value.

After delete the "volumes" folder, the server cannot init, with "启动后端服务超时,请检查日志文件 /workspace/qanything_local/logs/debug_logs/sanic_api.log 获取更多信息。". And the sanic_api.log:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/sanic/worker/serve.py", line 117, in worker_serve
    return _serve_http_1(
  File "/usr/local/lib/python3.10/dist-packages/sanic/server/runners.py", line 223, in _serve_http_1
    loop.run_until_complete(app._server_event("init", "before"))
  File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete
  File "/usr/local/lib/python3.10/dist-packages/sanic/app.py", line 1764, in _server_event
    await self.dispatch(
  File "/usr/local/lib/python3.10/dist-packages/sanic/signals.py", line 208, in dispatch
    return await dispatch
  File "/usr/local/lib/python3.10/dist-packages/sanic/signals.py", line 183, in _dispatch
    raise e
  File "/usr/local/lib/python3.10/dist-packages/sanic/signals.py", line 167, in _dispatch
    retval = await maybe_coroutine
  File "/usr/local/lib/python3.10/dist-packages/sanic/app.py", line 1315, in _listener
    await maybe_coro
  File "/workspace/qanything_local/qanything_kernel/qanything_server/sanic_api.py", line 70, in init_local_doc_qa
    local_doc_qa.init_cfg(mode=args.mode)
  File "/workspace/qanything_local/qanything_kernel/core/local_doc_qa.py", line 54, in init_cfg
    self.milvus_summary = KnowledgeBaseManager(self.mode)
  File "/workspace/qanything_local/qanything_kernel/connector/database/mysql/mysql_client.py", line 18, in __init__
    self.check_database_(host, port, user, password, database)
  File "/workspace/qanything_local/qanything_kernel/connector/database/mysql/mysql_client.py", line 32, in check_database_
    cnx = mysql.connector.connect(
  File "/usr/local/lib/python3.10/dist-packages/mysql/connector/pooling.py", line 293, in connect
    return CMySQLConnection(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/mysql/connector/connection_cext.py", line 129, in __init__
    self.connect(**kwargs)
  File "/usr/local/lib/python3.10/dist-packages/mysql/connector/abstracts.py", line 1237, in connect
    self._open_connection()
  File "/usr/local/lib/python3.10/dist-packages/mysql/connector/connection_cext.py", line 313, in _open_connection
    raise get_mysql_exception(
mysql.connector.errors.DatabaseError: 2003 (HY000): Can't connect to MySQL server on 'mysql-container-local:3306' (111)
[2024-06-25 19:51:15 +0800] [137] [ERROR] Not all workers acknowledged a successful startup. Shutting down.

One of your worker processes terminated before startup was completed. Please solve any errors experienced during startup. If you do not see an exception traceback in your error logs, try running Sanic in in a single process using --single-process or single_process=True. Once you are confident that the server is able to start without errors you can switch back to multiprocess mode.
ERROR:sanic.error:Not all workers acknowledged a successful startup. Shutting down.

One of your worker processes terminated before startup was completed. Please solve any errors experienced during startup. If you do not see an exception traceback in your error logs, try running Sanic in in a single process using --single-process or single_process=True. Once you are confident that the server is able to start without errors you can switch back to multiprocess mode.
[2024-06-25 19:51:15 +0800] [137] [INFO] Killing Sanic-Server-0-0 [695]
INFO:sanic.root:Killing Sanic-Server-0-0 [695]
[2024-06-25 19:51:15 +0800] [137] [INFO] Killing Sanic-Server-1-0 [696]
INFO:sanic.root:Killing Sanic-Server-1-0 [696]
[2024-06-25 19:51:15 +0800] [137] [INFO] Killing Sanic-Server-2-0 [697]
INFO:sanic.root:Killing Sanic-Server-2-0 [697]
[2024-06-25 19:51:15 +0800] [137] [INFO] Killing Sanic-Server-3-0 [698]
INFO:sanic.root:Killing Sanic-Server-3-0 [698]
[2024-06-25 19:51:15 +0800] [137] [INFO] Killing Sanic-Server-4-0 [699]
INFO:sanic.root:Killing Sanic-Server-4-0 [699]
[2024-06-25 19:51:15 +0800] [137] [INFO] Killing Sanic-Server-5-0 [700]
INFO:sanic.root:Killing Sanic-Server-5-0 [700]
[2024-06-25 19:51:15 +0800] [137] [INFO] Killing Sanic-Server-6-0 [701]
INFO:sanic.root:Killing Sanic-Server-6-0 [701]
[2024-06-25 19:51:15 +0800] [137] [INFO] Killing Sanic-Server-7-0 [702]
INFO:sanic.root:Killing Sanic-Server-7-0 [702]
[2024-06-25 19:51:15 +0800] [137] [INFO] Killing Sanic-Server-8-0 [703]
INFO:sanic.root:Killing Sanic-Server-8-0 [703]
[2024-06-25 19:51:15 +0800] [137] [INFO] Killing Sanic-Server-9-0 [704]
INFO:sanic.root:Killing Sanic-Server-9-0 [704]
[2024-06-25 19:51:15 +0800] [137] [INFO] Server Stopped
INFO:sanic.root:Server Stopped
UPLOAD_ROOT_PATH: /workspace/qanything_local/QANY_DB/content
llm_api_serve_port: None
rerank_port: 9001
embed_port: 9001
<Logger debug_logger (INFO)> <Logger qa_logger (INFO)>