TL;DR

Retrieval mechanisms can be used to provide kind of memory for an LLM. This memory can be used for semantic search and QA based on retrieved code/documentation fragments.

Context

Using vector DBs/stores it is possible to have features that go beyond model context size like semantic search and question answering over the whole codebase.

For example langchain's VectorDBQA works like this:

for a given set of documents it computes their embeddings using some model [^1]
stores vectors in database
given a query it retrieves results that have similar embeddings and based on that generates the answer with an LLM

Features

org-ai-find - like search with regex but finds fragments that have similar embeddings to query.
org-ai-ask - answer a given query using documents retrieved in previous step
versions of previous org-ai-ask but with filters like
- matching by regex instead of semantic search
- file extensions
- documentation
- selected files or code fragments
- files from different project
org-ai-setup-memory - initializes vectordb connection. first version will connect to pinecone/qdrant instance using its API. I don't think using OpenAI embeddings makes sense as they're really costly. In the future we can think about setting up a local DB (for example possible with ChromaDB) but this would need to run some external project because vectorDBs usually have clients in Python. org-ai-remember-project - indexes current project using vectordb. The command will send documents to external service that handles embedding documents and storing them in vector database. It will need to specify some params (with reasonable defaults). Chunk size is the most important part because the documents will be split into chunks, and when using embeddings it is extremely important to not have too long vectors.

What needs to be done

For the first version there are two ways:

Use external Python library and store index on disk

The simplest approach, I already have code that does most of the work. I can make a PyPI package with the needed functions and then Python results will be passed to Emacs.

Use external service

define API for retrieval I will propose an API for memory service that will be used for indexing the documents and finding code fragments with similar embeddings.
integrate LM with retrieval mechanism
write code for spinning up Docker container with service

Problems

It's not clear to me which ot the proposed approaches is cleaner/more Emacs-like. I think I'd be able to ship the basic version that uses Python by the end of next week. In the future it's not unlikely that there will be some standard API for such a memory service.

[^1]: like sentence-transformers much cheaper than OpenAI embeddings, see this article

rksm / org-ai

LM memory AKA vector databases #27