langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
47.89k stars 6.83k forks source link

API Bug.The 'indexing_status' API cannot query the batch of the' update_by_file 'API #9221

Open glacierck opened 2 days ago

glacierck commented 2 days ago

Self Checks

Dify version

0.9.1

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

1.Use the 'update_by_file' API to follow the new document and return the batch normally. 2.Use the 'indexing_status' API to query' batch 'progress information.

✔️ Expected Behavior

progress info.

❌ Actual Behavior

{
    "code": "not_found",
    "message": "Documents not found.",
    "status": 404
}

Ps. The 'batch' returned by the create_by_file api query does not have this issue.

dosubot[bot] commented 2 days ago

The "Documents not found" error from the indexing_status API occurs when the DocumentService.get_batch_documents method returns no documents for the given dataset ID and batch. This could be due to:

  1. The batch ID provided does not match any existing batch in the dataset.
  2. The documents associated with the batch have been deleted or were never successfully saved.
  3. The dataset might not exist, or the user does not have the necessary permissions to access it.

Ensure that the batch ID is correct and that the dataset and documents exist. Also, verify that you have the necessary permissions to access the dataset [1][2].

To continue talking to Dosu, mention @dosu.

glacierck commented 2 days ago

@dosu The update upload has been successfully checked in the web, but the progress information of the update cannot be queried.

dosubot[bot] commented 2 days ago

The issue with querying the progress information, despite a successful update upload, might be due to:

  1. Progress Update Logic: The onProgress function might not be triggered correctly, affecting the progress display.

  2. Error Handling: If an error occurs during the upload, the progress is set to -2, which could prevent proper display.

  3. Finalization: The progress should be set to 100 upon completion. If this step is missed, it might not reflect correctly [1].

Ensure these areas are functioning as expected to resolve the issue.

To continue talking to Dosu, mention @dosu.

glacierck commented 2 days ago

@dosu The update has been confirmed to be successful, but the issue is that the 'batch' returned by the API cannot retrieve progress information. The 'batch' returned by the new operation does not have this issue

dosubot[bot] commented 2 days ago

The issue with the 'batch' returned by the API not retrieving progress information is likely because the batch identifier is missing or incorrect. The 'batch' from the new operation includes a batch identifier, which is necessary to track progress using the indexing-status endpoint. Ensure that the batch ID is correctly generated and passed to the indexing_status API to retrieve progress information [1].

To continue talking to Dosu, mention @dosu.

glacierck commented 2 days ago

@dosu After testing, I found that it is possible to query the progress of the update through the 'batch' when adding, and once again confirm that the 'batch' returned during the update is invalid. The progress information of the document can only be retrieved through the initial 'batch'. This should be a serious bug!

dosubot[bot] commented 2 days ago

It seems like the issue is with the batch ID returned by the update_by_file API. If the batch ID is invalid, it won't match any existing records, leading to the "Documents not found" error when querying progress.

To address this, you should:

  1. Verify Batch ID Generation: Ensure that the batch ID generated during the update is correct and matches the format used in the create_by_file API.

  2. Check Database Records: Confirm that the batch ID is correctly stored and associated with the documents in the database.

  3. Debug Logging: Implement logging to track the batch ID generation and assignment process to identify any discrepancies.

These steps should help in diagnosing and fixing the issue with the batch ID during updates.

To continue talking to Dosu, mention @dosu.

glacierck commented 2 days ago

@dosu The progress information of 'creat_by_file' is incorrectly overwritten by the progress information of 'update_by_file', and progress can only be queried through the 'batch' returned by 'creat_by_file'

dosubot[bot] commented 2 days ago

The issue arises because both update_by_file and create_by_file operations return a batch identifier, but only the batch from create_by_file is used to query progress. This is due to the system design, which tracks progress using the batch from create_by_file as specified in the API documentation [1][2].

To resolve this, ensure that the batch ID from update_by_file is correctly generated and stored, and verify that it can be used to query progress. You might need to check the implementation to ensure that both operations handle batch IDs consistently.

To continue talking to Dosu, mention @dosu.