nchapman / pulsar

Mozilla Public License 2.0
1 stars 1 forks source link

Add vector database and Rust API as a general document index #6

Open nchapman opened 7 months ago

nchapman commented 7 months ago

We need a general vector index for RAG. This should be general purpose and allow different data sets and content to be stored. We don't know the complete schema yet as we aren't quite sure yet how we'll handle external data sources but you can imagine it being something like this:

The workflow would likely be that you would search the index for a matching document and if it was a large document then there would be a separate call to fetch the original data. In the internal use case this would probably look like using WatermelonDB to store the conversations and message which we would then index here. When using the index, we would search for matching documents, get their ids, and then query for the original text via WatermelonDB. We can consider other patterns but this seems like the most flexible way for us to support a lot of different sources that potentially represent a lot of data.

Ideally we'd use sqlite for storing the index. Here's a library to investigate: https://github.com/asg017/sqlite-vss

ospfranco commented 5 months ago

Reading about RAG now. So if I get this correctly. You want a table inside the db that contains the document information. Previous to that would would have to add/compile that extension into the sqlite instance so the index can be generated/stored/queried?

nchapman commented 5 months ago

Yep that's right! Let's chat about this one live. Some things we should consider:

ospfranco commented 4 months ago

Since Nick is on vacation this week, this will probably not get done as I need to discuss with him first before building anything @andriikrainii