Open glacierck opened 2 days ago
The "Documents not found" error from the indexing_status
API occurs when the DocumentService.get_batch_documents
method returns no documents for the given dataset ID and batch. This could be due to:
Ensure that the batch ID is correct and that the dataset and documents exist. Also, verify that you have the necessary permissions to access the dataset [1][2].
To continue talking to Dosu, mention @dosu.
@dosu The update upload has been successfully checked in the web, but the progress information of the update cannot be queried.
The issue with querying the progress information, despite a successful update upload, might be due to:
Progress Update Logic: The onProgress
function might not be triggered correctly, affecting the progress display.
Error Handling: If an error occurs during the upload, the progress is set to -2
, which could prevent proper display.
Finalization: The progress should be set to 100
upon completion. If this step is missed, it might not reflect correctly [1].
Ensure these areas are functioning as expected to resolve the issue.
To continue talking to Dosu, mention @dosu.
@dosu The update has been confirmed to be successful, but the issue is that the 'batch' returned by the API cannot retrieve progress information. The 'batch' returned by the new operation does not have this issue
The issue with the 'batch' returned by the API not retrieving progress information is likely because the batch identifier is missing or incorrect. The 'batch' from the new operation includes a batch identifier, which is necessary to track progress using the indexing-status
endpoint. Ensure that the batch ID is correctly generated and passed to the indexing_status
API to retrieve progress information [1].
To continue talking to Dosu, mention @dosu.
@dosu After testing, I found that it is possible to query the progress of the update through the 'batch' when adding, and once again confirm that the 'batch' returned during the update is invalid. The progress information of the document can only be retrieved through the initial 'batch'. This should be a serious bug!
It seems like the issue is with the batch ID returned by the update_by_file
API. If the batch ID is invalid, it won't match any existing records, leading to the "Documents not found" error when querying progress.
To address this, you should:
Verify Batch ID Generation: Ensure that the batch ID generated during the update is correct and matches the format used in the create_by_file
API.
Check Database Records: Confirm that the batch ID is correctly stored and associated with the documents in the database.
Debug Logging: Implement logging to track the batch ID generation and assignment process to identify any discrepancies.
These steps should help in diagnosing and fixing the issue with the batch ID during updates.
To continue talking to Dosu, mention @dosu.
@dosu The progress information of 'creat_by_file' is incorrectly overwritten by the progress information of 'update_by_file', and progress can only be queried through the 'batch' returned by 'creat_by_file'
The issue arises because both update_by_file
and create_by_file
operations return a batch
identifier, but only the batch
from create_by_file
is used to query progress. This is due to the system design, which tracks progress using the batch
from create_by_file
as specified in the API documentation [1][2].
To resolve this, ensure that the batch
ID from update_by_file
is correctly generated and stored, and verify that it can be used to query progress. You might need to check the implementation to ensure that both operations handle batch IDs consistently.
To continue talking to Dosu, mention @dosu.
Self Checks
Dify version
0.9.1
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
1.Use the 'update_by_file' API to follow the new document and return the batch normally. 2.Use the 'indexing_status' API to query' batch 'progress information.
✔️ Expected Behavior
progress info.
❌ Actual Behavior
Ps. The 'batch' returned by the
create_by_file
api query does not have this issue.