sobelio / llm-chain

`llm-chain` is a powerful rust crate for building chains in large language models allowing you to summarise text and complete complex tasks
https://llm-chain.xyz
MIT License
1.3k stars 128 forks source link

How to only add_{texts,documents} if they aren't already in the vector store? #232

Open silvergasp opened 10 months ago

silvergasp commented 10 months ago

First of all I'd like to thank you for all your hard work. This is a great set of tools, and I'm really enjoying using them. Let me first describe my use case.

I have a directory where I read a bunch of files and convert them to embeddings/vectors with llm-chain-qdrant. This works great! However where I'm running into issues is that some of these files are intermittently edited. This is where the problem lies, if I update 1 file, I should really only be adding 1 new vector to the vector store. However as there is no way of comparing which files are in the vector store with which files have changed on disk, you end up just having to compute/fetch a new embedding anyway which is done here. I'd like to propose adding a third method to the VectorStore trait, something like. document_exists(). Doing so would allow for you to only add new documents that aren't already in the VectorStore.

I have a workaround in the the works by just directly querying at the database level. But I think this would be a lot tidier. What are your thoughts on this?

silvergasp commented 10 months ago

Just FYI I'm happy to work on this, it just might take me a while to update all the trait implementations, and I'd like to get some expert eyes before I start.

williamhogman commented 10 months ago

Hey :)

Yeah I think that makes sense let's try to get it added. Let me know on discord if you need any help/pair programming :)