hegelai / prompttools

Open-source tools for prompt testing and experimentation, with support for both LLMs (e.g. OpenAI, LLaMA) and vector databases (e.g. Chroma, Weaviate, LanceDB).
http://prompttools.readthedocs.io
Apache License 2.0
2.56k stars 216 forks source link

LanceDB Integration #71

Closed AyushExel closed 11 months ago

AyushExel commented 11 months ago

LanceDB is a open-source, serverless, setup-free, multi-modal vector database.

@NivekT , regarding your comments about query_builder - it is essentially a function to query the database which accepts some optional arguments that can be passed with query_args. But feel free to edit of something doesn't make sense

NivekT commented 11 months ago

That makes sense about query_builder.

  1. If I understand correctly, it is doing an approximate nearest neighbor search, what happens if they want to compare that vs a Python full text search (the experimental feature)? Is that something that we want to support here.
  2. Within nearest neighbor search, is that the setup always like it is shown? Or should we allow them to override that default query_builder? What if they want to pass in nprobes, refine_factor, etc? The Weaviate example has some query_builder like that.
AyushExel commented 11 months ago

That makes sense about query_builder.

  1. If I understand correctly, it is doing an approximate nearest neighbor search, what happens if they want to compare that vs a Python full text search (the experimental feature)? Is that something that we want to support here.
  2. Within nearest neighbor search, is that the setup always like it is shown? Or should we allow them to override that default query_builder? What if they want to pass in nprobes, refine_factor, etc? The Weaviate example has some query_builder like that.
  1. I think we can allow FTS but then it would add some setup setps, like installing tantivy and rust. So maybe let's wait till it becomes mature (we're working on it)
  2. Yeah we can allow users to scpecify, n_probes and refine_factor but I think that should still be included in the query_args dict, as providing different functions as input and also different args as input might not be needed as we can simply customize the operation by passing different argument parameters? Happy to edit it if you prefer the weaviate way.

Currently, I was thinking something like this:

query_args = {
"n_probes": [...],
"refine_factor": [...],
"text": [...],
"metric": [...],
"filter": [...]
}
NivekT commented 11 months ago

Thanks for the contribution and answers. We can merge this as it is and expand it based on user requests.