infiniflow / ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
https://ragflow.io
Apache License 2.0
18.54k stars 1.88k forks source link

[Question]: Bypassing/purging problematic tasks ? #1383

Closed Randname666 closed 1 month ago

Randname666 commented 3 months ago

Describe your problem

A problematic task (how it's generated is unknown) is clogging up all other new tasks dispatched including non-PDF ones. The problematic task is nowhere to be found to be canceled in the WebUI. Currently the backend is giving out such errors constantly:

ragflow-server  | [WARNING] Load term.freq FAIL!
ragflow-server  | Traceback (most recent call last):
ragflow-server  |   File "/ragflow/rag/svr/task_executor.py", line 375, in <module>
ragflow-server  |     main()
ragflow-server  |   File "/ragflow/rag/svr/task_executor.py", line 294, in main
ragflow-server  |     rows = collect()
ragflow-server  |   File "/ragflow/rag/svr/task_executor.py", line 117, in collect
ragflow-server  |     assert tasks, "{} empty task!".format(msg["id"])
ragflow-server  | AssertionError: 2077fa703a6311efbc6f0242ac120006 empty task!
ragflow-mysql   | 2024-07-05T01:08:14.129120Z 28 [Note] Aborted connection 28 to db: 'rag_flow' user: 'root' host: '172.19.0.6' (Got an error reading communication packets)

docker compose down then docker compose up doesn't resolve the issue.

Is there a way to manually remove this problematic task? Additionally, is there a mechanism for task purging/canceling on error internally ?

guoyuhao2330 commented 3 months ago

This problem is due to the fact that you have generated dirty data as a result of multiple reboots, however it does not affect the operation,you can ignore this problem.

Randname666 commented 3 months ago

This problem is due to the fact that you have generated dirty data as a result of multiple reboots, however it does not affect the operation,you can ignore this problem.

But unfortunately, that one problematic task is clogging up all other new tasks dispatched. It simply goes away by waiting?

I ended up purging all the volumes of docker used by RagFlow. That fixed the issue, but of course with that, all the documents are gone which is definitely not a thing to perform if there are already a lot of documents processed in it.

Sephieroth commented 1 month ago

I have the same problem.

I solve the problem by deleting data in Redis Finally.

import redis r = redis.Redis(host="0.0.0.0",port=6379,password='infini_rag_flow') keys = r.keys('*') # keys are [b"rag_flow_svr_queue"] obj = r.delete('rag_flow_svr_queue')

After deleting the data, the parsing process works well.