langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
51.81k stars 7.51k forks source link

Dify knowledge base new paragraphs cannot be recalled #8768

Closed PolarPeak closed 1 month ago

PolarPeak commented 1 month ago

Self Checks

Dify version

0.7.2

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

After the upgrade, historical knowledge base entries can be recalled, but newly added knowledge base paragraphs cannot be recalled, although annotated data can be retrieved.

image Historical knowledge base entries can be recalled normally.

image image Newly added paragraphs cannot be recalled properly.

✔️ Expected Behavior

I hope that both newly added and historical paragraphs can be recalled properly, as they were before.

❌ Actual Behavior

After the upgrade, historical knowledge base entries can be recalled, but newly added knowledge base paragraphs cannot be recalled, although annotated data can be retrieved.

dosubot[bot] commented 1 month ago

I found a similar issue that might be relevant to your problem. In Dify version 0.7.2, newly added knowledge base paragraphs not being recalled while historical entries can be recalled is likely due to the indexing and retrieval settings.

When you upload new documents, they need to undergo chunking, cleaning, and embedding before they can be effectively recalled. If these steps are not properly configured or completed, the new entries may not be indexed correctly, leading to recall issues.

Ensure that the following steps are correctly followed for new documents:

  1. Segment and Clean the Document: Make sure the document is properly segmented and cleaned.
  2. Index Mode and Retrieval Settings: Choose and configure the appropriate index mode and retrieval settings.
  3. Embedding: Wait for the chunks to be embedded.

Additionally, it is essential to configure the retrieval mode to enhance the precision of retrieved content. Navigate to Context -- Retrieval Settings -- Rerank Setting to set the retrieval mode for the knowledge bases. This ensures that the retriever scans all linked knowledge bases for relevant text content and consolidates the results effectively [1][2].

If these steps are not properly executed, the new entries may not be indexed correctly, leading to recall issues.

To continue talking to Dosu, mention @dosu.

PolarPeak commented 1 month ago

@dosu How do I do that specifically?

dosubot[bot] commented 1 month ago

To ensure newly added knowledge base paragraphs can be recalled in Dify version 0.7.2, follow these steps:

  1. Upload Documents:

    • Select and upload the document to your knowledge base.
  2. Segment and Clean the Document:

    • Automatic Mode: Dify will handle segmentation and cleaning automatically.
    • Custom Mode: Manually configure chunking and cleaning rules if needed:

      chunking_rules = {
       "delimiter": "\n",
       "max_chunk_length": 4000,
       "chunk_overlap": 0.1
      }
      
      preprocessing_rules = {
       "replace_consecutive_spaces": True,
       "delete_urls_emails": True
      }
  3. Indexing Mode:

    • Choose the indexing method:
      • High-Quality Mode: Embedding model for higher accuracy.
      • Economical Mode: Keyword indexing to reduce costs.
      • Q&A Mode: Generates Q&A pairs (community version only).
  4. Embedding:

    • Wait for the text chunks to be embedded into numerical vectors.
  5. Retrieval Settings:

    • Configure retrieval settings:
      • Vector Search: Uses query vectors for semantic similarity.
      • Full-Text Search: Keyword-based retrieval.
      • Hybrid Search: Combines vector and full-text search, with optional rerank model.
  6. ETL Configuration (Optional):

    • Use ETL processes for better data recall. Dify supports both Dify ETL and Unstructured ETL.

By following these steps, you ensure that new documents are properly processed and can be recalled effectively [1][2][3].

To continue talking to Dosu, mention @dosu.

PolarPeak commented 1 month ago

Creating a new knowledge base also fails to recall entries.

crazywoola commented 1 month ago

Duplicated https://github.com/langgenius/dify/issues/8667