langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
51.81k stars 7.51k forks source link

Knowledge Retrieval Node Incorrectly Reverts to RerankModel #8854

Closed Kevin9703 closed 1 month ago

Kevin9703 commented 1 month ago

Self Checks

Dify version

0.8.3

Cloud or Self Hosted

Self Hosted (Source)

Steps to reproduce

  1. Configures a knowledge retrieval node to use WeightedScore
  2. Upon saving or reopening the workflow, the node is found to be using RerankModel instead.

✔️ Expected Behavior

The node should consistently maintain its WeightedScore configuration as set by the user.

Root Cause: After investigating the API calls, we found that the frontend is inconsistently sending the required data fields to the backend via the draft API. Specifically, the 'reranking_mode' and 'weights' parameters are sometimes missing from the API requests, causing a mismatch between the node's intended settings and its actual configuration (reverting to the RerankModel).

❌ Actual Behavior

I have identified an intermittent bug in the workflow involving the knowledge retrieval node. When setting the node to use WeightedScore, it occasionally and unexpectedly reverts to using the RerankModel instead. While this issue is not consistently reproducible, it occurs frequently enough to cause concern.

dosubot[bot] commented 1 month ago

The intermittent reversion from WeightedScore to RerankModel in the knowledge retrieval node of Dify version 0.8.3 could be due to the retrieval strategy configuration. Specifically, if the retrieval mode is set to "Multi-way Recall," the system requires a Rerank model to be configured for result re-ranking. This could cause the node to revert to using the RerankModel even if it was initially set to use WeightedScore [1][2].

Additionally, the retrieval settings in Dify's high-quality indexing mode indicate that the Rerank model is essential for semantic reordering of retrieved document results. This model is required for the TopK and Score Threshold settings to take effect during the reranking step [3].

The code in knowledge_retrieval_node.py shows that the retrieval mode and reranking configuration are explicitly checked and applied during the retrieval process. If the retrieval mode is set to "MULTIPLE" and the reranking mode is "reranking_model," the system will use the specified reranking model. If the reranking mode is "weighted_score," it will use the provided weights instead [4].

This behavior aligns with the observed issue where the node reverts to using the RerankModel.

To continue talking to Dosu, mention @dosu.