langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
53.73k stars 7.87k forks source link

The update process for the dataset_keyword_tables table is extremely slow. #11146

Open imaben opened 1 week ago

imaben commented 1 week ago

Self Checks

Dify version

0.10

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

image

When the number of documents in the knowledge base increases, indexing becomes exceptionally slow. After investigation, the main issue was found to be in the update of the dataset_keyword_tables table. Initial investigation suggests that the problem is due to full updates. I would like to ask if it is possible to disable write operations to this table, or if there are any plans for future updates?

✔️ Expected Behavior

No response

❌ Actual Behavior

No response

dosubot[bot] commented 1 week ago

Hey @imaben! 👋 I'm here to help you with any bugs, questions, or contributions you have in mind. Let's work together to resolve this issue.

I found a similar unsolved discussion titled How to disable keywords?, which might be relevant to your issue [1].

Regarding your question about disabling write operations to the dataset_keyword_tables table, I couldn't find any existing configurations or settings in Dify that allow for this. Additionally, there are no specific details available about planned updates or improvements addressing performance issues related to this table.

If you are looking for optimizations, you might consider reviewing the indexing process, which involves keyword extraction, updating the keyword table, and saving it back to the database. This process uses a Redis lock to ensure thread safety, which might be a point to investigate for performance improvements [2].

To continue talking to Dosu, mention @dosu.