snexus / llm-search

Querying local documents, powered by LLM
MIT License
421 stars 51 forks source link

feature request - api endpoint to vectordb and/or db update button in UI #63

Closed Hisma closed 8 months ago

Hisma commented 8 months ago

OK, I see you are still actively making updates to this application, so this is a feature that I think would be great to add at some point, especially since it looks like you're focusing on updating embeddings.

Let's say a user wants to have their data stored on a amazon s3 bucket. would be wonderful if there was a way for the app to have an api endpoint that points to the vectordb. There's an application I know of that is designed to work with aws & pinecone ( & other hosted vector dbs). It's purpose is to "listen" for changes on the aws datastore, and any time a new file is added, it will update the vectordb on-the-fly, so that the process of updating embeddings is instant & autonomous.

I told the dev about your app, and he said they could easily support it, provided there was an api endpoint they could point to.

Not sure the challenge this would add, especially since up to now it seems this app is mostly ran locally (aside from openai api support).

But perhaps something to add to the to-do list.

Another one that would be helpful, that I think should be easy to implement, is a button to update the vectordb inside the UI.
Basically, it could just run the llmsearch index create command from inside the UI. That, combined with your recent updates that appear to me like it incrementally updates the db with only new files, rather than rebuilds it every time the command is ran, would be useful!

You effectively allow end users to never need to interact with the cli to use the application.

snexus commented 8 months ago

Thanks for the suggestions - your second idea shouldn't be too hard to implement - will add it to to-do list.

The embedding update feature was necessary for this app - before that even for small/medium document base of 500MB it was painful to recreate from scratch.

Have to mention that the update process works on the file level (which is higher level than update functionality in vector dbs, which work on individual chunks level) - it has to scan all the existing files (as specified in configuration) and figure out what was changed/deleted/added, based on the files' hash. So the process is still not instant, but quick enough to do the updates frequently, if needed.

About the API endpoint, trying to understand the goal - do you want to host llmsearch on the cloud, with an exposed api that will allow add or remove documents from the internal vector store? So when documents are changed on the cloud storage, some service or app would ping the api with a request to update the embeddings?

Hisma commented 8 months ago

thanks! Yeah the update button would be a nice convenience as you could at that point leave the application running at all times and do everything you need in the UI.

The API endpoint, yes it would be if it were a cloud deployment, or partial cloud deployment (ie the app runs on-premise but the corpus of data is in the cloud). Think if you had a team of researchers that used your app, & they all lived in different geographic locations... putting the corpus on the cloud where everyone can add to it would be the easiest solution, and a middle-ware service is used to update your chromaDB any time a file change is detected. I see this as something that would be very low on the list, but something you can consider in the future when you have ran out of things to do haha.

snexus commented 8 months ago

The update button feature should be available in this branch - https://github.com/snexus/llm-search/tree/feature-webapp-embeddings-udpate, please test if it works for you.

The API endpoint, yes it would be if it were a cloud deployment, or partial cloud deployment (ie the app runs on-premise but the corpus of data is in the cloud). Think if you had a team of researchers that used your app, & they all lived in different geographic locations... putting the corpus on the cloud where everyone can add to it would be the easiest solution, and a middle-ware service is used to update your chromaDB any time a file change is detected. I see this as something that would be very low on the list, but something you can consider in the future when you have ran out of things to do haha.

I will think about it :)

Hisma commented 8 months ago

busy day yesterday sorry. image

I get this error. I think this may just mean I need to first recreate the index with the new version of the app before I can use the update feature, correct?

Hisma commented 8 months ago

ooof.

When I re-ran the ingest script to re-create the embeddings index, I ran the same test query I always ran, and got a result I was not expecting. In essence, it couldn't answer the question. Any theory as to why the llm could have gotten "dumber"?

snexus commented 8 months ago

When I re-ran the ingest script to re-create the embeddings index, I ran the same test query I always ran, and got a result I was not expecting. In essence, it couldn't answer the question. Any theory as to why the llm could have gotten "dumber"?

It shouldn't happen, new functionality doesn't touch the embedding logic or querying logic. Can you see the proper chunks retrieved when using this version of the app?

Hisma commented 8 months ago

When I re-ran the ingest script to re-create the embeddings index, I ran the same test query I always ran, and got a result I was not expecting. In essence, it couldn't answer the question. Any theory as to why the llm could have gotten "dumber"?

It shouldn't happen, new functionality doesn't touch the embedding logic or querying logic. Can you see the proper chunks retrieved when using this version of the app?

Yep, this was on me. I added some new files to the db. When adding them, it caused the results of the search to be worse. When I deleted the new files and re-ran the search, I got the results I wanted.

However, I tried adding a new file and got and using the update on-the-fly button, and got this error - image

something caused my GPU to run out of memory.

snexus commented 8 months ago

Yep, this was on me. I added some new files to the db. When adding them, it caused the results of the search to be worse. When I deleted the new files and re-ran the search, I got the results I wanted.

Good to hear, you got me concerned for a moment, haha.

However, I tried adding a new file and got and using the update on-the-fly button, and got this error

Updating index requires additional GPU memory (since it uses the embedding model). Is it possible that you had the memory almost full and using the update caused an out-of-memory error as a result? Are you able please to test with a smaller model just that confirm that that's the problem (or monitor GPU VRAM usage to confirm that?)

If that's the case - I am not sure what would be a good solution. Perhaps unload the model, do the indexing, reload the model?

Hisma commented 8 months ago

I have 2 GPUs and llama.cpp does a great job of auto-splitting models between the 2 cards. Here's what the model looked like before doing the embeddings update - image

Here's what happened when I tried to do the update - image

It loaded all the embeddings to the primary GPU, rather than recognize that there was a lot of empty space on the 2nd GPU.

So yes, I think the only way to do this elegantly is to unload the model, index, then reload.
How much effort that takes for you, I don't know if you think it is worth the effort. Doing manual indexing isn't too much work, though it does ultimately make the application have a more "POC" feel when app functionality is split between different arguments that need to be executed from a command-line.

snexus commented 8 months ago

Thanks for confirming, I will try to implement it in the next couple of days.

Doing manual indexing isn't too much work, though it does ultimately make the application have a more "POC" feel when app functionality is split between different arguments that need to be executed from a command-line.

You are right, having it in the same UI makes it a better experience, think it is worth the effort .

Hisma commented 8 months ago

Agreed, it looked so clean when all I had to do was press a button to update! It will be a very nice UI/UX once this gets properly implemented. And end users only need to interact with the cli to start the app (which is easy to automate w/ a script as I have done), and can then just leave the app running all times.

snexus commented 8 months ago

Pushed the changes to the same branch which hopefully will solve the problem - please test on your side when you have time. Thank you!

Hisma commented 8 months ago

Just tested. It works great. I made sure to watch GPU memory & it was able to load & unload no problem. Thank you!