langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
35.59k stars 4.8k forks source link

Error in using knowledge base recall test #5350

Open xddcode opened 1 week ago

xddcode commented 1 week ago

Self Checks

Dify version

0.6.11

Cloud or Self Hosted

Self Hosted (Source)

Steps to reproduce

  1. A pdf file was uploaded
  2. image image

✔️ Expected Behavior

normal

❌ Actual Behavior

error

dosubot[bot] commented 1 week ago

Hello @xddcode, I'm here to help you with any bugs, questions, or contributions you may have. Let's work together to solve this issue.

To resolve the "3 validation errors for ParsingModel[InlineResponse20015]" due to extra inputs not being permitted when running the knowledge base recall test in Dify version 0.6.11, ensure that the inputs being set do not contain any extra fields that are not permitted by the ParsingModel.

In the provided code, the produce function from the immer library is used to create new input objects while ensuring that only the necessary fields are included. Specifically, the setInputs function is designed to remove either single_retrieval_config or multiple_retrieval_config based on the retrieval_mode.

Here is the relevant part of the code that handles this:

const setInputs = useCallback((s: KnowledgeRetrievalNodeType) => {
  const newInputs = produce(s, (draft) => {
    if (s.retrieval_mode === RETRIEVE_TYPE.multiWay)
      delete draft.single_retrieval_config
    else
      delete draft.multiple_retrieval_config
  })
  doSetInputs(newInputs)
}, [doSetInputs])

This ensures that only the appropriate configuration is included based on the retrieval mode, preventing extra inputs from being set.

Additionally, when changing the retrieval mode, the code ensures that the correct configuration is set:

const handleRetrievalModeChange = useCallback((newMode: RETRIEVE_TYPE) => {
  const newInputs = produce(inputs, (draft) => {
    draft.retrieval_mode = newMode
    if (newMode === RETRIEVE_TYPE.multiWay) {
      draft.multiple_retrieval_config = {
        top_k: draft.multiple_retrieval_config?.top_k || DATASET_DEFAULT.top_k,
        score_threshold: draft.multiple_retrieval_config?.score_threshold,
        reranking_model: !draft.multiple_retrieval_config?.reranking_model?.provider
          ? {
            provider: rerankDefaultModel?.provider?.provider || '',
            model: rerankDefaultModel?.model || '',
          }
          : draft.multiple_retrieval_config?.reranking_model,
      }
    } else {
      const hasSetModel = draft.single_retrieval_config?.model?.provider
      if (!hasSetModel) {
        draft.single_retrieval_config = {
          model: {
            provider: currentProvider?.provider || '',
            name: currentModel?.model || '',
            mode: currentModel?.model_properties?.mode as string,
            completion_params: {},
          },
        }
      }
    }
  })
  setInputs(newInputs)
}, [currentModel?.model, currentModel?.model_properties?.mode, currentProvider?.provider, inputs, rerankDefaultModel?.model, rerankDefaultModel?.provider?.provider, setInputs])

By ensuring that only the necessary fields are included in the inputs, you can avoid the validation errors related to extra inputs [1][2].

To continue talking to Dosu, mention @dosu.

xddcode commented 1 week ago

@dosu,Now in full-text search mode, no error. The above error occurs in vector retrieval mode

dosubot[bot] commented 1 week ago

To resolve the validation errors related to extra inputs in vector retrieval mode in Dify version 0.6.11, ensure that the handleChange function correctly updates the multiple_retrieval_config without including any extra inputs. Verify that the DatasetConfigs object passed to handleChange does not contain any unexpected properties.

Here's the relevant part of the code that handles the configuration update:

const handleChange = useCallback((configs: DatasetConfigs, isRetrievalModeChange?: boolean) => {
    if (isRetrievalModeChange) {
      onRetrievalModeChange(configs.retrieval_model)
      return
    }
    onMultipleRetrievalConfigChange({
      top_k: configs.top_k,
      score_threshold: configs.score_threshold_enabled ? (configs.score_threshold || DATASET_DEFAULT.score_threshold) : null,
      reranking_model: payload.retrieval_mode === RETRIEVE_TYPE.oneWay
        ? undefined
        : (!configs.reranking_model?.reranking_provider_name
          ? {
            provider: rerankDefaultModel?.provider?.provider || '',
            model: rerankDefaultModel?.model || '',
          }
          : {
            provider: configs.reranking_model?.reranking_provider_name,
            model: configs.reranking_model?.reranking_model_name,
          }),
    })
  }, [onMultipleRetrievalConfigChange, payload.retrieval_mode, rerankDefaultModel?.provider?.provider, rerankDefaultModel?.model, onRetrievalModeChange])

Ensure that the DatasetConfigs object only includes the necessary properties (top_k, score_threshold, score_threshold_enabled, reranking_model, and retrieval_model). If there are any extra properties, they should be removed before calling handleChange.

Additionally, refer to the Panel component in web/app/components/workflow/nodes/knowledge-retrieval/panel.tsx to understand how the configuration is managed and ensure that the handleRetrievalModeChange and handleMultipleRetrievalConfigChange functions are correctly implemented [1][2].

To continue talking to Dosu, mention @dosu.