langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
51.44k stars 7.42k forks source link

The retry index button? #7354

Closed backlux closed 2 months ago

backlux commented 2 months ago

Self Checks

Dify version

0.7.0

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

If indexing somehow got interrupted, it won't auto retry. It's a huge dataset and there's no control from frontend to manually retry (the whole process).

So, the previous issue got closed without clarification:

"We do have a retry button, in knowledge -> Documents, you can see. the button on the top left corner."

  1. This happens when the knowledge base is created and not ready.
  2. I can stop and restart for single document in the dataset. but how to do restart for all

I can't find the button at top left? to retry, looked everywhere Please help, thanks.

image

No button here?

image

✔️ Expected Behavior

Have retry button

❌ Actual Behavior

No retry button

dosubot[bot] commented 2 months ago

Hi @backlux! I'm here to help you with any bugs, questions, or contribution-related inquiries you might have.

The retry button to restart the indexing process for all documents in Dify version 0.7.0, self-hosted via Docker, will only be visible if there are documents with indexing errors. The button is not displayed if there are no documents that failed to index.

The RetryButton component checks the errorDocs data to determine if there are any documents with errors. If errorDocs?.total is 0, the button will not be rendered. Here is the relevant part of the code:

useEffect(() => {
  if (errorDocs?.total === 0)
    dispatch({ type: 'success' })
  else
    dispatch({ type: 'error' })
}, [errorDocs?.total])

if (indexState.value === 'success')
  return null

To see the retry button, you need to have documents that failed to index. Once there are such documents, the button will appear, allowing you to retry the indexing process [1].

To continue talking to Dosu, mention @dosu.

backlux commented 2 months ago

@dosu. How do mark one document as failed? Status is indexing, but it's not actually doing indexing.

backlux commented 2 months ago

@dosu Or is there any way to trigger the index retry process via api endpoint?

dosubot[bot] commented 2 months ago

To mark a document as failed or to trigger the index retry process via an API endpoint in Dify version 0.7.0, self-hosted via Docker, you can use the following methods:

Marking a Document as Failed

Currently, there isn't a direct API endpoint to mark a document as failed. However, you can manage document statuses and handle indexing errors using the existing API endpoints. For example, you can delete a document and re-upload it to trigger the indexing process again.

Triggering the Index Retry Process via API Endpoint

You can trigger the index retry process via an API endpoint by using the retry_document_indexing_task function. Here is an example of how you can set up an API endpoint to trigger this task:

from flask import Flask, request, jsonify
from retry_document_indexing_task import retry_document_indexing_task

app = Flask(__name__)

@app.route('/api/retry_index', methods=['POST'])
def retry_index():
    data = request.json
    dataset_id = data.get('dataset_id')
    document_ids = data.get('document_ids')

    if not dataset_id or not document_ids:
        return jsonify({"error": "dataset_id and document_ids are required"}), 400

    retry_document_indexing_task.delay(dataset_id, document_ids)
    return jsonify({"message": "Index retry process triggered"}), 200

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

In this example, you can trigger the index retry process by sending a POST request to the /api/retry_index endpoint with a JSON payload containing the dataset_id and document_ids [1].

Example API Request

Here is an example of how to make a POST request to the /api/retry_index endpoint:

curl --location --request POST 'http://your-server-ip:5000/api/retry_index' \
--header 'Content-Type: application/json' \
--data-raw '{
    "dataset_id": "your_dataset_id",
    "document_ids": ["document_id_1", "document_id_2"]
}'

This will trigger the retry process for the specified documents in the dataset.

To continue talking to Dosu, mention @dosu.

crazywoola commented 2 months ago

As the bot's said, the retry button is only available for the failed tasks. So if you want to have a retry button then it should be a feature request other than bug report.

backlux commented 2 months ago

The way you talk is so ...

It stuck in indexing for over an hour... "the retry button is only available for the failed tasks"

And you say, ok, it should be a feature request.

Just think, if you really want to push this product to a larger audience, and so many more people will face this. And all this is such an easy bug to fix, just add timeout to make it "FAIL"... That's all you need, instead of saying it's not even a bug.

What are you even talking about???

backlux commented 2 months ago

The retry logic is shit, if it works ,you make it better, then it's a feature. When it is not working, you call it a bug!

backlux commented 2 months ago

I guess that's just how you want your product to be.

crazywoola commented 2 months ago

Sorry for the misunderstanding, however https://github.com/langgenius/dify/issues/7353 you didn't provide much useful information. Not like this one. So based on the information you provide, I can only tell you that there is a 'retry button'.

And all this is such an easy bug to fix, just add timeout to make it "FAIL"... That's all you need, instead of saying it's not even a bug.

I agree with this one, I am aware we do lack a timeout task to mark them as 'FAIL'.

We do want to make it better. I will try to fix it later.

backlux commented 2 months ago

I did find out what the problem really is, it's because when using ollama to do embed, if you set the url to something like http://docker.internal.host, then it never timeout, and all task will stuck. (maybe a dns error from docker I don't know)

A walk around would be use IP addr, temp fixed for me.