yorkie-team / codepair

Build your own AI-powered collaborative markdown editor in just 5 minutes
https://codepair.yorkie.dev
Apache License 2.0
46 stars 19 forks source link

Add Semantic Search Feature #305

Open devleejb opened 3 weeks ago

devleejb commented 3 weeks ago

What would you like to be added:

I propose to add a Semantic Search feature that enhances the ability to search and retrieve documents semantically. This functionality could be beneficial for users looking to improve the relevancy of search results beyond traditional keyword matching. The conceptual architecture and workflow are illustrated in the images included.

Key Decisions Needed:

  1. When to save/update documents in the Vector Store?

    • Options:
      • Every time a document is updated
      • Periodically through a Cron Job
      • After a set duration without updates (e.g., 10 minutes)
      • Initially embed large documents, then embed smaller updates, with periodic consolidation.
  2. How to store existing data in the Vector Store during feature deployment?

  3. Chunking Strategy:

    • Different chunking methods have advantages and disadvantages, including:
      • Parent-Child Chunking
      • Fixed Chunking
      • Other strategies
  4. Embedding Model:

    • What model should we use for embedding?
    • It may be costly to rely on commercial models like OpenAI due to frequent embedding needs.
    • Exploring options like Ollama or smaller models could be sufficient.
  5. Vector Store Considerations:

    • Recommendations for potential Vector Stores:
      • Milvus (29k)
      • Weviate (10k)
      • Chroma (14k)
      • Faiss (30k)
    • Need for features like Namespace to support separation by Workspace for better data management.

Why is this needed:

Integrating a Semantic Search feature will significantly enhance user experience by providing more relevant and efficient search capabilities.

Additional Information:

devleejb commented 3 weeks ago

image

sihyeong671 commented 2 weeks ago

Here is a list of resources that are useful to read when adding this feature.

devleejb commented 3 days ago

This feature can be useful for resolving this issue: https://github.com/yorkie-team/yorkie/issues/1002