netease-youdao / QAnything

Question and Answer based on Anything.
https://qanything.ai
GNU Affero General Public License v3.0
11.93k stars 1.16k forks source link

[BUG] <title>所有文件都无法解析,报错milvus insert error #524

Open wuwu369 opened 2 months ago

wuwu369 commented 2 months ago

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

所有文件都无法解析,报错milvus insert error。

期望行为 | Expected Behavior

No response

运行环境 | Environment

- OS:
- NVIDIA Driver:
- CUDA:
- Docker Compose:
- NVIDIA GPU Memory:

QAnything日志 | QAnything logs

2024-09-19 09:25:34,592 - [PID: 1784][Sanic-Server-0-0] - [Function: aadd_texts] - ERROR - Failed to insert batch starting at entity: 0/81 2024-09-19 09:25:34,602 - [PID: 1784][Sanic-Server-0-0] - [Function: process_data] - ERROR - milvus insert error: Traceback (most recent c all last): File "/workspace/QAnything/qanything_kernel/dependent_server/insert_files_serve/insert_files_server.py", line 108, in process_data chunks_number, insert_time_record = await asyncio.wait_for( File "/usr/local/lib/python3.10/asyncio/tasks.py", line 445, in wait_for return fut.result() File "/workspace/QAnything/qanything_kernel/utils/general_utils.py", line 166, in get_time_async_inner res = await func(args, kwargs) # 注意这里使用 await 来调用异步函数 File "/workspace/QAnything/qanything_kernel/core/retriever/parent_retriever.py", line 210, in insert_documents return await self.retriever.aadd_documents(docs, parent_chunk_size=parent_chunk_size, File "/workspace/QAnything/qanything_kernel/core/retriever/parent_retriever.py", line 141, in aadd_documents res = await self.vectorstore.aadd_documents(embed_docs, time_record=time_record) File "/usr/local/lib/python3.10/site-packages/langchain_core/vectorstores.py", line 153, in aadd_documents return await self.aadd_texts(texts, metadatas, kwargs) File "/workspace/QAnything/qanything_kernel/core/retriever/vectorstore.py", line 252, in aadd_texts raise e File "/workspace/QAnything/qanything_kernel/core/retriever/vectorstore.py", line 242, in aadd_texts res: MutationResult = await asyncio.to_thread( File "/usr/local/lib/python3.10/asyncio/threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(self.args, self.kwargs) File "/usr/local/lib/python3.10/site-packages/pymilvus/orm/collection.py", line 497, in insert res = conn.batch_insert( File "/usr/local/lib/python3.10/site-packages/pymilvus/decorators.py", line 135, in handler raise e from e File "/usr/local/lib/python3.10/site-packages/pymilvus/decorators.py", line 131, in handler return func(*args, *kwargs) File "/usr/local/lib/python3.10/site-packages/pymilvus/decorators.py", line 170, in handler return func(self, args, kwargs) File "/usr/local/lib/python3.10/site-packages/pymilvus/decorators.py", line 110, in handler raise e from e File "/usr/local/lib/python3.10/site-packages/pymilvus/decorators.py", line 74, in handler return func(*args, **kwargs) File "/usr/local/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 566, in batch_insert raise err from err File "/usr/local/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 549, in batch_insert request = self._prepare_batch_insert_request( File "/usr/local/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 533, in _prepare_batch_insert_request else Prepare.batch_insert_param(collection_name, entities, partition_name, fields_info) File "/usr/local/lib/python3.10/site-packages/pymilvus/client/prepare.py", line 525, in batch_insert_param return cls._parse_batch_request(request, entities, fields_info, location) File "/usr/local/lib/python3.10/site-packages/pymilvus/client/prepare.py", line 493, in _parse_batch_request raise ParamError( pymilvus.exceptions.ParamError: <ParamError: (code=1, message=('Field data size misaligned for field [vector] ', 'got size=[81] ', 'alignm ent size=[59]'))>

2024-09-19 09:25:34,603 - [PID: 1784][Sanic-Server-0-0] - [Function: check_and_process] - INFO - time_record: {"parse_time": 0.01, "insert _error": true}

复现方法 | Steps To Reproduce

centos7使用docker compose 运行后,就可复现

备注 | Anything else?

微信图片_20240919173420

Jason-cs18 commented 2 months ago

+1

ouyang11111 commented 2 months ago

I meet the same bug how to fix?

Alreadtstart commented 2 months ago

解决了嘛

wuwu369 commented 2 months ago

解决了嘛

没有,官方没人管,尝试其他开源工具了

guyuet commented 2 months ago

2024 年 9 月 24 日 现在用的什么开源工具呢,有无推荐

guyuet commented 2 months ago

建议重启服务后查看一下elasticsearch的log日志,看看有没有爆出储存不足的错误,如果有的话就是储存不足,然后就可以运行了

xiayi0409 commented 1 month ago

建议重启服务后查看一下elasticsearch的log日志,看看有没有爆出储存不足的错误,如果有的话就是储存不足,然后就可以运行了

你好 可以具体说说嘛 我还是解决不了()

Alreadtstart commented 1 month ago

我解决了,,建议解决思路:1,es内存占用,检测关闭无用的docker es容器,释放内存 2,下载qanything的embeding模型,放在Qanything的根目录下,如果不行就去/QAnything/qanything_kernel/dependent_server/embedding_server中,创建embed_models 目录,将下载的embeding模型文件放里面。思路就两个,因为下载的qanything的文件是没有embeding模型的,直接解析报错,二是es内存不足

Alreadtstart commented 1 month ago

建议重启服务后查看一下elasticsearch的log日志,看看有没有爆出储存不足的错误,如果有的话就是储存不足,然后就可以运行了

你好 可以具体说说嘛 我还是解决不了()

Alreadtstart commented 1 month ago

建议重启服务后查看一下elasticsearch的log日志,看看有没有爆出储存不足的错误,如果有的话就是储存不足,然后就可以运行了

你好 可以具体说说嘛 我还是解决不了() 我解决了,,建议解决思路:1,es内存占用,检测关闭无用的docker es容器,释放内存 2,下载qanything的embeding模型,放在Qanything的根目录下,如果不行就去/QAnything/qanything_kernel/dependent_server/embedding_server中,创建embed_models 目录,将下载的embeding模型文件放里面。思路就两个,因为下载的qanything的文件是没有embeding模型的,直接解析报错,二是es内存不足

Alreadtstart commented 1 month ago

I meet the same bug how to fix?

我解决了,,建议解决思路:1,es内存占用,检测关闭无用的docker es容器,释放内存 2,下载qanything的embeding模型,放在Qanything的根目录下,如果不行就去/QAnything/qanything_kernel/dependent_server/embedding_server中,创建embed_models 目录,将下载的embeding模型文件放里面。思路就两个,因为下载的qanything的文件是没有embeding模型的,直接解析报错,二是es内存不足

Alreadtstart commented 1 month ago

解决了嘛

没有,官方没人管,尝试其他开源工具了

我解决了,,建议解决思路:1,es内存占用,检测关闭无用的docker es容器,释放内存 2,下载qanything的embeding模型,放在Qanything的根目录下,如果不行就去/QAnything/qanything_kernel/dependent_server/embedding_server中,创建embed_models 目录,将下载的embeding模型文件放里面。思路就两个,因为下载的qanything的文件是没有embeding模型的,直接解析报错,二是es内存不足

siberiah0h commented 1 month ago

已经解决啦,详细的办法这里写不下可以看看我的博客:https://www.dataeast.cn/archives/1728728804585 image

lrmor commented 1 month ago

已经解决啦,详细的办法这里写不下可以看看我的博客:https://www.dataeast.cn/archives/1728728804585 image

给你点个赞