feat: lightweight, pure rust k-ANN vector database for long-term memory/knowledge-base

jon-chuang commented 1 year ago

I think the next step in the project is lightweight ANN (approx. k-nearest neighbours search) vector database. Applications:

Document store over local documents: code bases, journals, articles. Input: directory with text files.
Chrome-assistant: A memory over your currently and recently opened tabs with llama-wasm
Mobile: similar to local

Details:

The k-ANN database should always be an optional dep compiled under a feature flag.
We will reuse the loaded model for encoding. See: here, section 4.3. It suggests using the analog to [CLS] token. See e.g. LlamaIndex. Not too sure for decoding. It can just return the text string from the metadata. Alternately, one can load an embedding/decoding model that is serialized to ggml.

Problem definition:

We are ok trading off a bit of performance to have something that has minimal surface area
We want the database to be persistent - it should persist the index after generation. The model is very similar to ggml, we should generate the artifact once, or periodically, and then load the artifact (index) into memory.

Options:

:x: connect to existing vector database (qdrant, milvus, pinecone). But these are heavy deps, and also have many features designed that are more about scaling out (cloud native). We like transparency and owning the artifacts involved. We are willing to tradeoff a bit of perf and or implementation complexity for aim.
:x: Compile faiss as an optional dependency. Still a pretty huge dependency.
:crab: Something rust-native: e.g. hora. Not actively maintained, but still works and I've run it locally. We can slice out the core functionality (e.g. just HNSW). It already has a persisted format for index. We can add mmap support (optimization). Hopefully we can slice out to about 2K loc.

Plan:

Use prompt engineering to allow model to indicate via a special unicode sequence. llama-rs will detect the unicode sequence and trigger a database lookup.
- not clear how well this would work. Ideally, we should have prompt-tuned this, LoRA-based fine-tuning might work.
Implement either partial encoding with existing LLM, or allow loading an embedding model.

hhamud commented 1 year ago

interesting, like a rust specific version of this?

It would also be interesting if we could use this to store prompts and their outputs in the database and press the up key to re-use previous prompts or their outputs but we wouldn't even need a vector database for this specifically, we could just do that with a typical SQL database.

We would also need to re-visit this https://github.com/rustformers/llama-rs/issues/56

jon-chuang commented 1 year ago

interesting, like a rust specific version of this?

Yes, there are many options available but they mainly offer the same type of indexes.

re-use previous prompts or their outputs

The problem with a hash table or KV store is that natural language queries are rarely exactly the same, especially if you are not averaging over the human population but just running local.

Milvus has already promoted a similarity search-based "cacheing" as one of its applications (repo)

philpax commented 1 year ago

I think this is out of scope for this repository specifically. I could see a batteries-included implementation being built atop llama-rs, but it's unlikely to feature an implementation of a vector database itself because our focus is specifically on robust, fast inference of LLMs.

jon-chuang commented 1 year ago

our focus is specifically on robust, fast inference of LLMs.

Yes, but I think the broader focus is "low-resource, low-dependency embedded LLM toolchain".

I can definitely see the sliced out k-ANN code existing in a separate repo (perhaps under this org) and compiled in as an optional dependency to llama-rs and available in the cli (on crates.io it would be cargo install llama-rs --features "knowledge-base")

hhamud commented 1 year ago

"low-resource, low-dependency embedded LLM toolchain".

Literally what I was thinking of yesterday

jon-chuang commented 1 year ago

I've made an issue here to sound out the idea: https://github.com/ggerganov/llama.cpp/issues/930

philpax commented 1 year ago

I can definitely see the sliced out k-ANN code existing in a separate repo (perhaps under this org) and compiled in as an optional dependency to llama-rs and available in the cli (on crates.io it would be cargo install llama-rs --features "knowledge-base")

Sure, but I don't see why it would have to be part of llama-rs specifically. The CLI is really just a demo application for the library; it doesn't aspire to higher functionality than that.

I'm not opposed to having this kind of functionality - having a full-stack solution for using a LLM to do knowledge base inference would be great - but I think it's a hard sell to make it part of this crate specifically. By analogy, we're like hyper, not reqwest - we're not trying to solve all the problems, just the core problem that enables other people to solve their problems.

jon-chuang commented 1 year ago

but I think it's a hard sell to make it part of this crate specifically.

I'm in agreement here. But do you think that rustformers org more generally could be expanded to this broader scope of a low-resource LLM toolchain and host the broader-scoped llama-rs-toolchain?

philpax commented 1 year ago

Sorry - meant to get back to you earlier. Yeah, I think having this as part of a larger solution would be great. I've created this repository to track issues that aren't directly related to llama-rs, but are for the ecosystem around it.

Has anyone experimented with this? Are there any estimates on how much work it would be?

jon-chuang commented 1 year ago

I’ve not experimented, but it’s on my (currently very long) todo list. I estimate it could be a week of work to get the code in place, but it may take some additional experimentation with prompting (e.g. to emit sequence of tokens indicating search action) to get the models to work well with the knowledge base.

I’ll hopefully get to it once I’m back from holiday.

hhamud commented 1 year ago

Any updates on this? @jon-chuang

itsbalamurali commented 1 year ago

@jon-chuang @hhamud & @philpax i've taken a dig at porting chroma to rust: https://gist.github.com/itsbalamurali/118e7ce18f1519f26780b9845dee4e87 has the basic structure to it.

needs : https://github.com/chroma-core/chroma/blob/d98be4d0bfb760155d9f85c9012952ef459c10a6/chromadb/db/clickhouse.py#L583

hhamud commented 1 year ago

@jon-chuang @hhamud & @philpax i've taken a dig at porting chroma to rust: https://gist.github.com/itsbalamurali/118e7ce18f1519f26780b9845dee4e87 has the basic structure to it.

needs : https://github.com/chroma-core/chroma/blob/d98be4d0bfb760155d9f85c9012952ef459c10a6/chromadb/db/clickhouse.py#L583

Nice, do you have an actual full repo to share rather than just a gist?

shkr commented 1 year ago

I am interested in implementing a rust knowledge base for llms

zicklag commented 1 year ago

Cozo might be useful. I'm totally out-of-the-loop, so it might not work for what you're looking for. I figured I'd share just in case.

ealmloff commented 1 year ago

I implemented an in memory version of this as part of Floneum. Here is the relivent code: https://github.com/floneum/floneum/blob/master/plugin/src/vector_db.rs

Instant distance is fairly easy to work with and actively maintained

shkr commented 1 year ago

Cozo might be useful. I'm totally out-of-the-loop, so it might not work for what you're looking for. I figured I'd share just in case.

Thanks cozo is very interesting, and might solve the use case I was thinking of.

ayourtch commented 1 year ago

I saw https://github.com/tensorchord/pgvecto.rs today - it fits the bill of “rust only”. (Admittedly I am too new to this field to even fully understand if this is relevant or not, but in case someone might find it useful)

rustformers / ecosystem

feat: lightweight, pure rust k-ANN vector database for long-term memory/knowledge-base #2