rksm / org-ai

Emacs as your personal AI assistant. Use LLMs such as ChatGPT or LLaMA for text generation or DALL-E and Stable Diffusion for image generation. Also supports speech input / output.
GNU General Public License v3.0
665 stars 52 forks source link

LM memory AKA vector databases #27

Open lambdaofgod opened 1 year ago

lambdaofgod commented 1 year ago

TL;DR

Retrieval mechanisms can be used to provide kind of memory for an LLM. This memory can be used for semantic search and QA based on retrieved code/documentation fragments.

Context

Using vector DBs/stores it is possible to have features that go beyond model context size like semantic search and question answering over the whole codebase.

For example langchain's VectorDBQA works like this:

Features

What needs to be done

For the first version there are two ways:

Use external Python library and store index on disk

The simplest approach, I already have code that does most of the work. I can make a PyPI package with the needed functions and then Python results will be passed to Emacs.

Use external service

Problems

It's not clear to me which ot the proposed approaches is cleaner/more Emacs-like. I think I'd be able to ship the basic version that uses Python by the end of next week. In the future it's not unlikely that there will be some standard API for such a memory service.

[^1]: like sentence-transformers much cheaper than OpenAI embeddings, see this article

rksm commented 1 year ago

This would be super++ awesome! Feel free to use whatever language makes most sense for you. Happy to transition things into elisp later where suitable.