infiniflow / ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
https://ragflow.io
Apache License 2.0
20.16k stars 2k forks source link

[Bug]: The document parsing failed and exited. #2085

Closed morler closed 1 month ago

morler commented 1 month ago

Is there an existing issue for the same bug?

Branch name

main

Commit ID

0f95086

Other environment information

ASUS Tuf Gaming F15 Pro 13980HX 4060 Laptop Graphics Card
Win11
WSL2

Actual behavior

When parsing the document, an error occurred and the program exited. 微信图片_20240825102339

Expected behavior

No response

Steps to reproduce

Update to the latest version (dev or 0.10.0), upload the document, and then parse it.

Additional information

I believe the issue is with the function parameter settings of the newly added error output mechanism. When the LLM returns unexpected information (such as error messages), it causes the program to receive parameters of the wrong type, which in turn triggers the interrupt process mechanism.

morler commented 1 month ago

I have tested it, and there is no issue with the code built from the 9b3f5fd commit.

KevinHuSh commented 1 month ago

Could you paste the docker logs? It shows the calling stack there。 Or, could you share the file sample?

morler commented 1 month ago

2024-08-26 11:19:43 ragflow-server | Traceback (most recent call last): 2024-08-26 11:19:43 ragflow-server | File "/ragflow/graphrag/graph_extractor.py", line 139, in call 2024-08-26 11:19:43 ragflow-server | result, token_count = self._process_document(text, prompt_variables) 2024-08-26 11:19:43 ragflow-server | File "/ragflow/graphrag/graph_extractor.py", line 188, in _process_document 2024-08-26 11:19:43 ragflow-server | if response.find("ERROR") >=0: raise Exception(response) 2024-08-26 11:19:43 ragflow-server | Exception: ERROR: Error code: 400, with error text {"contentFilter":[{"level":1,"role":"assistant"}],"error":{"code":"1301","message":"系统检测到输入或生成内容可能包含不安全或敏感内容,请您避免输入易产生敏感内容的提示语,感谢您的配合。"}} 2024-08-26 11:19:43 ragflow-server | 2024-08-26 11:19:43 ragflow-server | During handling of the above exception, another exception occurred: 2024-08-26 11:19:43 ragflow-server | 2024-08-26 11:19:43 ragflow-server | Traceback (most recent call last): 2024-08-26 11:19:43 ragflow-server | File "/ragflow/rag/svr/task_executor.py", line 165, in build 2024-08-26 11:19:43 ragflow-server | cks = chunker.chunk(row["name"], binary=binary, from_page=row["from_page"], 2024-08-26 11:19:43 ragflow-server | File "/ragflow/rag/app/knowledge_graph.py", line 18, in chunk 2024-08-26 11:19:43 ragflow-server | chunks = build_knowlege_graph_chunks(tenant_id, sections, callback, 2024-08-26 11:19:43 ragflow-server | File "/ragflow/graphrag/index.py", line 85, in build_knowlege_graphchunks 2024-08-26 11:19:43 ragflow-server | graphs.append(.result().output) 2024-08-26 11:19:43 ragflow-server | File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result 2024-08-26 11:19:43 ragflow-server | return self.get_result() 2024-08-26 11:19:43 ragflow-server | File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in get_result 2024-08-26 11:19:43 ragflow-server | raise self._exception 2024-08-26 11:19:43 ragflow-server | File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run 2024-08-26 11:19:43 ragflow-server | result = self.fn(*self.args, **self.kwargs) 2024-08-26 11:19:43 ragflow-server | File "/ragflow/graphrag/graph_extractor.py", line 145, in call 2024-08-26 11:19:43 ragflow-server | if callback: callback("Knowledge graph extraction error:{}".format(str(e))) 2024-08-26 11:19:43 ragflow-server | File "/ragflow/rag/svr/task_executor.py", line 80, in set_progress 2024-08-26 11:19:43 ragflow-server | if prog is not None and prog < 0: 2024-08-26 11:19:43 ragflow-server | TypeError: '<' not supported between instances of 'str' and 'int'

JinHai-CN commented 1 month ago

Fixed by #2096

kkrusher commented 1 month ago

Is the latest docker image updated? Or should I build it locally to avoid this issue?

Fixed by #2096

morler commented 1 month ago

The BUG has been fixed. Thx!