langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
51.48k stars 7.44k forks source link

shapes (1536,) and (3072,) not aligned: 1536 (dim 0) != 3072 (dim 0) on knowlege retereval testing #8346

Closed TaylorHere closed 2 months ago

TaylorHere commented 2 months ago

Self Checks

Dify version

0.8.0

Cloud or Self Hosted

Cloud

Steps to reproduce

run knowlege retereval testing on any database, also, run any workflow related with knowlege retereval node will facing a 'NoneType' object has no attribute 'provider' error

✔️ Expected Behavior

No response

❌ Actual Behavior

No response

dosubot[bot] commented 2 months ago

To resolve the error "shapes (1536,) and (3072,) not aligned: 1536 (dim 0) != 3072 (dim 0)" during knowledge retrieval testing in Dify version 0.8.0, you need to ensure that the dimensions of the vectors being used in the retrieval process are correctly aligned. This error typically occurs when there is a mismatch in the expected dimensions of the vectors.

Here are the steps to fix this issue:

  1. Check the Embedding Dimensions: Ensure that the embedding dimensions used in the retrieval process are consistent. The expected embedding dimension for the vectors used in the knowledge retrieval process in Dify is 1536 [1].

  2. Update the Model Configuration: Ensure that the model configuration used for retrieval is correctly set up with the appropriate embedding dimensions.

Here is a snippet of the code where you might need to check and update the embedding dimensions:

def _fetch_model_config(
    self, node_data: KnowledgeRetrievalNodeData
) -> tuple[ModelInstance, ModelConfigWithCredentialsEntity]:
    model_name = node_data.single_retrieval_config.model.name
    provider_name = node_data.single_retrieval_config.model.provider

    model_manager = ModelManager()
    model_instance = model_manager.get_model_instance(
        tenant_id=self.tenant_id, model_type=ModelType.LLM, provider=provider_name, model=model_name
    )

    provider_model_bundle = model_instance.provider_model_bundle
    model_type_instance = model_instance.model_type_instance
    model_type_instance = cast(LargeLanguageModel, model_type_instance)

    model_credentials = model_instance.credentials

    # Check model
    provider_model = provider_model_bundle.configuration.get_provider_model(
        model=model_name, model_type=ModelType.LLM
    )

    if provider_model is None:
        raise ValueError(f"Model {model_name} not exist.")
    if provider_model.status == ModelStatus.NO_CONFIGURE:
        raise ProviderTokenNotInitError(f"Model {model_name} credentials is not initialized.")
    elif provider_model.status == ModelStatus.NO_PERMISSION:
        raise ModelCurrentlyNotSupportError(f"Dify Hosted OpenAI {model_name} currently not support.")
    elif provider_model.status == ModelStatus.QUOTA_EXCEEDED:
        raise QuotaExceededError(f"Model provider {provider_name} quota exceeded.")

    # Model config
    completion_params = node_data.single_retrieval_config.model.completion_params
    stop = []
    if "stop" in completion_params:
        stop = completion_params["stop"]
        del completion_params["stop"]

    # Get model mode
    model_mode = node_data.single_retrieval_config.model.mode
    if not model_mode:
        raise ValueError("LLM mode is required.")

    model_schema = model_type_instance.get_model_schema(model_name, model_credentials)

    if not model_schema:
        raise ValueError(f"Model {model_name} not exist.")

    # Ensure embedding dimensions are consistent
    embedding_dimension = model_schema.get("embedding_dimension")
    expected_embedding_dimension = 1536  # Set the expected dimension
    if embedding_dimension != expected_embedding_dimension:
        raise ValueError(f"Embedding dimension mismatch: expected {expected_embedding_dimension}, got {embedding_dimension}")

    return model_instance, ModelConfigWithCredentialsEntity(
        provider=provider_name,
        model=model_name,
        model_schema=model_schema,
        mode=model_mode,
        provider_model_bundle=provider_model_bundle,
        credentials=model_credentials,
        parameters=completion_params,
        stop=stop,
    )

In this snippet, expected_embedding_dimension is set to 1536, ensuring that the dimensions of the vectors are aligned correctly, preventing the shape mismatch error [2].

For the 'NoneType' object has no attribute 'provider' error, ensure that the single_retrieval_config object is properly initialized and contains the necessary attributes. The single_retrieval_config object should have a model attribute of type ModelConfig, which includes the provider and other necessary details [3][4].

To continue talking to Dosu, mention @dosu.

crazywoola commented 2 months ago

Please provide the settings of knowledge and system provider settings.

TaylorHere commented 2 months ago

image

TaylorHere commented 2 months ago

after re-apply same setting, the error disappeared