For RAG based NLQ, discuss vector embeddings and vector store to be used

soilwise-he / natural-language-querying

Application component that provides Natural Language Querying (NLQ) services, making knowledge stored in a graph database accessible for e.g. a ChatBot UI.

MIT License

0 stars 0 forks source link

For RAG based NLQ, discuss vector embeddings and vector store to be used #4

Closed robknapen closed 1 month ago

robknapen commented 2 months ago

RAG typically needs generating vector embeddings from data (documents, databases, or KGs), that will later provide contextual information for the LLM. Generating embeddings takes a lot of compute time (depending on the model used for it), so it is practical to store them in a vector database. At least a store/database that has the needed embeddings search functionality. Such a vector store needs to be hosted/installed as part of SoilWise.

robknapen commented 1 month ago

Virtuoso does not seem to have vector embedding capabilities, while Neo4J has (in general Neo4J seems to be more interested in NLP/LLM integrations). PostgreSQL can also handle vector embeddings. Another option would be a stand-alone vector store, such as milvus.

Metaphacts seems to have experimented with combining RDF and vector spaces: paper

Since we are considering ElasticSearch, this also now has support for embeddings: link

robknapen commented 1 month ago

Considering issue #2 , the vector store will be something local/internal to the NLQ component only. So for now I will select milvus for it.