workflow query did not return results,but recall test was successful.

tzdmwwdz commented 2 weeks ago

Self Checks

[X] This is only for bug report, if you would like to ask a question, please head to Discussions.
[X] I have searched for existing issues search for existing issues, including closed ones.
[X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[X] Please do not modify this template :) and fill in all the required fields.

Dify version

0.6.10

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

Sometimes in the workflow, the query did not return any results, but the backend knowledge base received the query and there was no issue with the recall during testing.

✔️ Expected Behavior

The query information in the workflow and the recall results from the knowledge base are consistent.

❌ Actual Behavior

dosubot[bot] commented 2 weeks ago

@tzdmwwdz Hello there! I'm here to help you with bug solving, answering questions, or becoming a contributor. Let's work together to resolve any issues you may have.

To resolve the inconsistency between the workflow query results and the backend knowledge base recall results in your self-hosted Dify version 0.6.10 (Docker), you should consider the recall mode you are using. Dify supports two recall modes: N选1召回模式 (N-to-1 recall mode) and 多路召回模式 (multi-path recall mode).

N选1召回模式 (N-to-1 recall mode):
- This mode selects the most relevant single dataset based on the user's intent and dataset descriptions.
- It is suitable for applications with fewer datasets that are distinctly different from each other.
- If the dataset descriptions are not clear or the datasets are too similar, the recall effectiveness may be impacted.
多路召回模式 (multi-path recall mode):
- This mode matches the user's intent with all datasets simultaneously and retrieves relevant text fragments from multiple datasets.
- It then reorders the results using a Rerank model to select the best match for the user's query.
- This mode is recommended for applications with multiple datasets as it does not rely on the model's inference ability or dataset descriptions.

To improve consistency, you should:

Switch to Multi-path Recall Mode: If you are currently using the N选1召回模式, consider switching to the 多路召回模式. This mode is more robust for applications with multiple datasets and can provide higher quality recall results.
Configure Rerank Model: Ensure that you have configured the Rerank model correctly to improve the semantic reordering of the retrieved documents.

Here is a brief guide on how to switch to multi-path recall mode:

Go to your Dify application settings.
Navigate to the recall mode settings.
Select 多路召回模式 (multi-path recall mode).
Configure the Rerank model API as required.

Additionally, ensure that you have properly configured the Knowledge Base Retrieval Node by following these steps:

Selecting the Query Variable: Ensure that the query variable represents the user's input question. In the "Start" node of conversational applications, the system pre-sets "sys.query" as the user input variable.
Choosing the Knowledge Base for Query: Add an existing knowledge base from Dify. For instructions on creating a knowledge base, refer to the knowledge base help documentation.
Configuring the Retrieval Strategy: Modify the indexing strategy and retrieval mode for the knowledge base within the node. For detailed explanations, refer to the retrieval mode explanation in the help documentation.

Moreover, ensure that your workflow configurations are synchronized correctly. You can use the WorkflowToolConfigurationUtils class to check and validate your workflow graph variables and tool parameter configurations. Here is a brief overview of the relevant methods:

check_parameter_configurations: Validates the parameter configurations.
get_workflow_graph_variables: Retrieves the workflow graph variables from the start node.
check_is_synced: Ensures that the variables and tool configurations are synchronized, raising a ValueError if there is a mismatch.

By following these steps and ensuring proper synchronization of your workflow configurations, you should be able to resolve the inconsistency between the workflow query results and the backend knowledge base recall results ^[1]^[2]^[3]^[4].

To continue talking to Dosu, mention @dosu.

tzdmwwdz commented 2 weeks ago

The multi-path recall is already in use.

tzdmwwdz commented 2 weeks ago

The configuration should not be the problem. This is because some queries are able to return correct results from the knowledge base.

langgenius / dify