Knowledge Retrieval Node Incorrectly Reverts to RerankModel

langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.

Other

51.81k stars 7.51k forks source link

Self Checks

[X] This is only for bug report, if you would like to ask a question, please head to Discussions.
[X] I have searched for existing issues search for existing issues, including closed ones.
[X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[X] [FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
[X] Please do not modify this template :) and fill in all the required fields.

Dify version

0.8.3

Cloud or Self Hosted

Self Hosted (Source)

Steps to reproduce

Configures a knowledge retrieval node to use WeightedScore
Upon saving or reopening the workflow, the node is found to be using RerankModel instead.

✔️ Expected Behavior

The node should consistently maintain its WeightedScore configuration as set by the user.

Root Cause: After investigating the API calls, we found that the frontend is inconsistently sending the required data fields to the backend via the draft API. Specifically, the 'reranking_mode' and 'weights' parameters are sometimes missing from the API requests, causing a mismatch between the node's intended settings and its actual configuration (reverting to the RerankModel).

❌ Actual Behavior

I have identified an intermittent bug in the workflow involving the knowledge retrieval node. When setting the node to use WeightedScore, it occasionally and unexpectedly reverts to using the RerankModel instead. While this issue is not consistently reproducible, it occurs frequently enough to cause concern.

The intermittent reversion from WeightedScore to RerankModel in the knowledge retrieval node of Dify version 0.8.3 could be due to the retrieval strategy configuration. Specifically, if the retrieval mode is set to "Multi-way Recall," the system requires a Rerank model to be configured for result re-ranking. This could cause the node to revert to using the RerankModel even if it was initially set to use WeightedScore ^[1]^[2].

Additionally, the retrieval settings in Dify's high-quality indexing mode indicate that the Rerank model is essential for semantic reordering of retrieved document results. This model is required for the TopK and Score Threshold settings to take effect during the reranking step ^[3].

The code in knowledge_retrieval_node.py shows that the retrieval mode and reranking configuration are explicitly checked and applied during the retrieval process. If the retrieval mode is set to "MULTIPLE" and the reranking mode is "reranking_model," the system will use the specified reranking model. If the reranking mode is "weighted_score," it will use the provided weights instead ^[4].

This behavior aligns with the observed issue where the node reverts to using the RerankModel.

To continue talking to Dosu, mention @dosu.

langgenius / dify