Vector Embeddings Storage and Search

justyns commented 1 month ago

Background thread: https://community.silverbullet.md/t/thoughts-on-applications-of-ai-or-smarts-in-silverbullet/559

Some sort of vector search would be great to have, but also raises some new issues explained below.

Storage - Where do we store the embeddings? a. We could potentially store them in the existing sqlite db and perform our own query/search logic. There might be some performance implications, but it could be done without 3rd party dependencies. b. We could require a new 3rd party dependency. Something like ChromaDB or Redis. c. Use the sqlite-vec extension d. ???
Embeddings - How do we generate the embeddings? Locally or not? a. Tensorflow.js is a pretty big dependency, but would be needed to generate embeddings locally. b. The option I'm leaning towards, relying on the LLM provider to provide an embeddings api instead. I think most major providers do offer this api already. Some local providers like Ollama also offer it. c. What happens when the model is changed? Embeddings would have to either be regenerated or the model used to generate them needs to be configured separately.
Search - TBD: Is it already possible to replace SB's built-in search with one implemented by a plug?
RAG - A separate issue, but should we automatically search through embeddings and send context to the llm for the various commands?

The biggest question to me is where do we store the embeddings. Ideally I don't want to require a 3rd party component/server, but that may end up being the best option. It's probably worth testing the sqlite + javascript route first.

zefhemel commented 1 month ago

Some pointers based on my own knowledge/research:

Note that if you'd want to use some SQLite extension with Deno, you'd have to end up including an entirely new instance (maybe WASM compiled version) of SQLite to SilverBullet — probably not desirable. So if you don't use the native SilverBullet data store, I'd opt for a completely external vector storage solution.

On embeddings: I looked at tensorflow.js as well. I think this is somewhat doable, but you probably also have to load a big model (I think these tend to me dozens or hundreds of megatypes). This cannot practically be included in a plug bundle, so it would have to be kept elsewhere. This is why I mentally scrossed tensorflow off my list.

justyns commented 1 month ago

Thanks for the feedback

You're right that the embedding model would be big (relative to the js at least). A "small" model like all-MiniLM-L6-v2 is still around 100mb. Maybe Silverbullet could offer a way to cache large downloaded files like this that don't get added by the git-plug and aren't synced to each client?

Right now I am leaning towards relying on apis to do the embedding generation. tensorflow.js and fastembed-js seem like good options, but it's also something I'm not sure I want to maintain and troubleshoot. There are a lot of issues I can see coming up from trying to embed that logic. Maybe in a future version.

I'm going to try your idea first of storing the embeddings in the silverbullet datastore and looping over all of them. I have a decently sized space folder, so I feel like it'll give me a good idea if it's practical or not.

justyns commented 1 month ago

Hey @zefhemel , how would you integrate new search results/providers into the search plug? I saw there was a query:full-text event, but realized it doesn't seem like it is ever triggered. queryProvider also doesn't look used.

I ended up copying part of the search plug and making my own virtual page to show search results to test with, but it'd be nice to integrate it.

Also re: speed - so far it actually seems fine. I was expecting it to be really slow, but it's not. I tested out a space with around 1500 notes of various sizes and it barely took 2-3 minutes to generate embeddings for all of them using ollama (once the model is loaded after the first request). Searching over all of those embeddings takes a second or two, but it's not much of a delay.

zefhemel commented 1 month ago

Yeah there's no infrastructure to extend search. It's not the best part of SB I'd say 😀 I would also opt for just creating a parallel implementation and call it "semantic search" or something fancy.

zefhemel commented 1 month ago

Happy to hear that the performance looks ok btw! Sometimes the dumbest solutions are just fine. SilverBullet queries are also implemented via simple row by row table scans. Who needs fancy indexes if your dataset isn't that big? (Famous last words)

justyns commented 1 month ago

Okay, I just merged https://github.com/justyns/silverbullet-ai/pull/37 in. silverbullet-ai.plug.js in main can be used now. The embedding stuff is disabled by default, but https://ai.silverbullet.md/Configuration/Embedding%20Models/ has the instructions/config examples. I'm testing with this:

ai:
  indexEmbeddings: true
  embeddingModels:
  # Only the first model is currently used
  - name: ollama-all-minilm
    modelName: all-minilm
    provider: ollama
    # baseUrl: http://localhost:11434
    baseUrl: https://ollama.lan.mydomain
    requireAuth: false

I don't know enough about the different models to recommend one over another, but all-minilm is working for me so far. Since generating embeddings is apparently not very intensive, I do think using a local ollama instance is the best choice for now.

justyns commented 1 month ago

Also for anyone who tries this, I'd be very interested in stats related to how long it takes to reindex your whole space and whether you have much of a delay when searching using the AI: Search command.

For me, on my m1 macbook, it only took a minute or two to index ~1500 pages with ollama. But on my actual server with SB hosted and Ollama hosted on a separate server, it was around 7 minutes. Still not bad for a mostly one-time operation though

justyns commented 1 month ago

This is officially part of 0.2.0

justyns / silverbullet-ai

Vector Embeddings Storage and Search #34