langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
46.79k stars 6.61k forks source link

When using the knowledge base API to repeatedly delete and re upload files, after a certain number of times, it will cause the knowledge base node to be unable to query any data #8667

Open lzzzzzzzzz opened 6 days ago

lzzzzzzzzz commented 6 days ago

Self Checks

Dify version

0.8.3

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

When using the knowledge base API to repeatedly delete and re upload files, after a certain number of times, it will cause the knowledge base node to be unable to query any data, butcan be found in the knowledge base retrieval test

知识库清空重传.yml.txt

image image

✔️ Expected Behavior

No response

❌ Actual Behavior

No response

dosubot[bot] commented 6 days ago

The issue of the knowledge base node being unable to query any data after repeatedly deleting and re-uploading files is likely due to the error code dataset_not_initialized. This error indicates that the knowledge base is still initializing or indexing, and thus cannot process new queries or operations until this process is complete.

Supporting evidence from the document:

| dataset_not_initialized | 400 | 知识库仍在初始化或索引中。请稍候  |

This suggests that if you delete and re-upload files too frequently, the system may not have enough time to complete the indexing process, leading to a state where the knowledge base is perpetually initializing or indexing, and thus unable to handle queries.

Additionally, the provided test cases in the test_client.py file show that there are deliberate pauses (time.sleep(1)) between operations such as creating, updating, and deleting documents. This indicates that allowing some time for the system to process each operation is crucial to avoid issues related to the knowledge base being in an uninitialized state. Therefore, incorporating delays between delete and re-upload operations might help mitigate this issue [1][2].

To continue talking to Dosu, mention @dosu.

JohnJyong commented 6 days ago

pls run the single node (knowledge-retrieval-node), is the issue still exist ?

PolarPeak commented 3 days ago

请运行单个节点(知识搜索节点),问题仍然存在吗?

I have some differences with him. I am unable to retrieve paragraphs added to the existing knowledge base. I checked the API logs, which show the following messages:

{"action":"restapi_management","level":"info","msg":"Serving weaviate at http://[::]:8080","time":"2024-09-26T04:52:02Z"}
{"action":"read_disk_use","level":"warning","msg":"disk usage currently at 89.52%, threshold set to 80.00%","path":"/var/lib/weaviate","time":"2024-09-26T04:52:31Z"}
[2024-09-26 03:16:43,377: INFO/MainProcess] Start process document: ce79ecce-445b-4835-9052-720c7491dc5b
[2024-09-26 03:16:47,939: WARNING/MainProcess] {'error': [{'message': 'store is read-only'}]}
[2024-09-26 03:16:47,993: INFO/MainProcess] Processed dataset: 896b6e41-8c92-4090-96cf-50bd643e1c7c latency: 4.618616024963558
[2024-09-26 03:16:48,008: INFO/MainProcess] Task tasks.document_indexing_task.document_indexing_task[9423068a-d1c4-48a9-9730-d1519d697d6d] succeeded in 4.634086176753044s: None

I'm not sure if these errors are related. https://github.com/langgenius/dify/issues/8768

lzzzzzzzzz commented 3 days ago

pls run the single node (knowledge-retrieval-node), is the issue still exist ?

yes, and no error show in docker log( dify-api )