Open lzzzzzzzzz opened 6 days ago
The issue of the knowledge base node being unable to query any data after repeatedly deleting and re-uploading files is likely due to the error code dataset_not_initialized
. This error indicates that the knowledge base is still initializing or indexing, and thus cannot process new queries or operations until this process is complete.
Supporting evidence from the document:
| dataset_not_initialized | 400 | 知识库仍在初始化或索引中。请稍候 |
This suggests that if you delete and re-upload files too frequently, the system may not have enough time to complete the indexing process, leading to a state where the knowledge base is perpetually initializing or indexing, and thus unable to handle queries.
Additionally, the provided test cases in the test_client.py
file show that there are deliberate pauses (time.sleep(1)
) between operations such as creating, updating, and deleting documents. This indicates that allowing some time for the system to process each operation is crucial to avoid issues related to the knowledge base being in an uninitialized state. Therefore, incorporating delays between delete and re-upload operations might help mitigate this issue [1][2].
To continue talking to Dosu, mention @dosu.
pls run the single node (knowledge-retrieval-node), is the issue still exist ?
请运行单个节点(知识搜索节点),问题仍然存在吗?
I have some differences with him. I am unable to retrieve paragraphs added to the existing knowledge base. I checked the API logs, which show the following messages:
{"action":"restapi_management","level":"info","msg":"Serving weaviate at http://[::]:8080","time":"2024-09-26T04:52:02Z"}
{"action":"read_disk_use","level":"warning","msg":"disk usage currently at 89.52%, threshold set to 80.00%","path":"/var/lib/weaviate","time":"2024-09-26T04:52:31Z"}
[2024-09-26 03:16:43,377: INFO/MainProcess] Start process document: ce79ecce-445b-4835-9052-720c7491dc5b
[2024-09-26 03:16:47,939: WARNING/MainProcess] {'error': [{'message': 'store is read-only'}]}
[2024-09-26 03:16:47,993: INFO/MainProcess] Processed dataset: 896b6e41-8c92-4090-96cf-50bd643e1c7c latency: 4.618616024963558
[2024-09-26 03:16:48,008: INFO/MainProcess] Task tasks.document_indexing_task.document_indexing_task[9423068a-d1c4-48a9-9730-d1519d697d6d] succeeded in 4.634086176753044s: None
I'm not sure if these errors are related. https://github.com/langgenius/dify/issues/8768
pls run the single node (knowledge-retrieval-node), is the issue still exist ?
yes, and no error show in docker log( dify-api )
Self Checks
Dify version
0.8.3
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
When using the knowledge base API to repeatedly delete and re upload files, after a certain number of times, it will cause the knowledge base node to be unable to query any data, butcan be found in the knowledge base retrieval test
知识库清空重传.yml.txt
✔️ Expected Behavior
No response
❌ Actual Behavior
No response