langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
51.08k stars 7.35k forks source link

Embedding Model in YAML Not Updated After Changing Knowledge Context in Dify #8306

Closed kikumoto closed 1 month ago

kikumoto commented 1 month ago

Self Checks

Dify version

0.8.0

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

  1. Create two Knowledges in Dify: a. Specify Bedrock's cohere.embed-multilingual-v3 as the Embedding Model (Knowledge A) b. Specify Azure OpenAI Service's text-embedding-3-large as the Embedding Model (Knowledge B)

  2. In Studio, create a Chatbot - Basic: a. Add only Knowledge A in the Context b. Publish the chatbot c. Export DSL

At this stage, the YAML file contains the following settings:

model_config.dataset_configs.reranking_mode: weighted_score
model_config.dataset_configs.weights:vector_setting.embedding_model_name: cohere.embed-multilingual-v3
model_config.dataset_configs.weights:vector_setting.embedding_provider_name: bedrock
  1. Next: a. Remove Knowledge A from the Context b. Add Knowledge B to the Context c. Publish the chatbot d. Export DSL again At this stage, the YAML file remains unchanged:
model_config.dataset_configs.reranking_mode: weighted_score
model_config.dataset_configs.weights:vector_setting.embedding_model_name: cohere.embed-multilingual-v3
model_config.dataset_configs.weights:vector_setting.embedding_provider_name: bedrock

Additionally, the UI does not provide a way to edit the embedding_model_name in the Retrieval Setting.

✔️ Expected Behavior

This behavior is problematic when switching between different Embedding Models, a common practice when experimenting. The expected outcome should be:

model_config.dataset_configs.reranking_mode: weighted_score
model_config.dataset_configs.weights:vector_setting.embedding_model_name: text-embedding-3-large
model_config.dataset_configs.weights:vector_setting.embedding_provider_name: azure_openai

❌ Actual Behavior

No response

dosubot[bot] commented 1 month ago

Hello, @kikumoto! I'm here to help you with any bugs, questions, or contributions you have in mind. Let's work together to resolve the issue you're facing.

I found a similar closed issue that might be relevant to your problem: Unexpected problems encountered in the use of the knowledge base. This issue was referenced by another issue: After adding the built-in application on the official website to the workspace, the model modification cannot be completed, which was last updated on June 30, 2024. The suggested solution was to refer to a duplicate issue: https://github.com/langgenius/dify/issues/2721 [1].

Regarding your specific issue, the YAML file is not updated after changing the knowledge context in Dify, specifically when switching between different embedding models, because the knowledge in Community Edition gets stuck in "Queued" when Q&A segmentation mode is enabled. This issue can occur if the rate limit has been reached for the Embedding model API key used. Additionally, changing the embedding model will re-embed all documents in the knowledge base, and the original embeddings will be deleted, which might cause delays or issues in updating the YAML file [2][3][4].

Please check if the rate limit has been reached for the Embedding model API key you are using. If the rate limit is not the issue, you might need to wait for the re-embedding process to complete before the YAML file gets updated.

To continue talking to Dosu, mention @dosu.

crazywoola commented 1 month ago

This is fixed in #8183 :)

kikumoto commented 1 month ago

@dosu We are not encountering a rate limit issue. Also, waiting does not resolve the problem.

Could there be an issue with the Payload when calling the api/apps/xxxxxx/mode-config API? In the Payload,

dataset_configs -> reranking_mode and dataset_configs -> weights -> vector_setting -> embedding_model_name have not been changed from the values initially registered for the knowledge base. As a result, I suspect that the database side is not being updated.

Could you please review the situation once again?