magda-io / magda

A federated, open-source data catalog for all your big data and small data
https://magda.io
Apache License 2.0
509 stars 93 forks source link

LLM Powered Search Engine #3503

Open t83714 opened 7 months ago

t83714 commented 7 months ago

LLM Powered Search Engine

This epic is about adding the LLM (Large language model) powered Search Engine to open-sourced Magda code space in addition to the existing keyword-based search engines.

This ticket is an epic that provides an overview of the problem that we are trying to solve.

1. Motivation

We need a vector store/searching engine to facilitate LLM embedding-based indexing & searching.

2. Indexing Strategy

3. Vector Store

We will use OpenSearch 2.x knn-vector field. Why?

4. Indexing Module / Microservice

We need to introduce a new module to our platform based on Magda's minion framework.

Some of the design has been covered by tickets I created for AI4M data-sharing platform:

But this ticket is for more generic use cases and will become the common base/facility for all Magda based projects