Open CShorten opened 9 months ago
100%. Would love to support this. Does it need to be a general interface? I guess so because dspy.Retrieve is currently provider-agnostic.
The easy way to do this right away is to use the Weaviate class inside the module directly. That was it’s used as a tool, not as a retriever (not plug n play with other retrievers and will not receive any automatic optimization for retrieval if we ever add any). But currently all optimizers focus on the LM not the retriever itself, so there’s no real harm in this except interoperability.
What do you think Connor? Do we just pass **kwargs to the underlying provider anyway? And only some providers will implement all features like filters?
[Tool vs. Retrieve]
Ah, I think a very interesting distinction could be made between dspy.Retrieve
and dspy.Tool
.
Can you please catch me up on how the Python Interpreter is interfaced? Or maybe a simple calculator is a better example.
Maybe Retrieve
inherits Tool
?
[kwargs]
I think **kwargs gives us the most flexibility to, exactly as you mention, offer features supported in some retrievers and not others while keeping a fairly standard interface. I think if we did it this way it would have the lowest risk of breaking changes with whatever we want to do next.
[Interface with DSPy Compiler]
I am imagining you could optimize a metadata filter with something like this:
class QueryToFilter(dspy.Module):
def __init__(self):
# Probably better to use a Signature for this one that describes the cardinality of the filter
self.query_to_filter = dspy.Predict("query -> metadata_filter_speaker")
self.retrieve = dspy.Retrieve(k=3)
# ...
def forward(self, question):
filter = self.query_to_filter(question).metadata_filter_speaker
# Interface fitler cardinality with DSPy Assertions
dspy.Assert(filter, "Filter must be one of ['Omar Khattab', 'Bob van Luijt', 'Etienne Dilocker', ...]")
contexts = self.retrieve(query=question, filter=filter)
# ...
What?
Interface metadata filters in dspy.Retrieve classes such as the Weaviate / Mongo / Pinecone / Qdrant / Chroma RMs.
Why?
Symbolic filters can be used to improve vector search results.
For example, we may only want podcast clips about "DSPy" where the "speaker" is "Omar Khattab".
How?
forward
call.So in the
forward
pass of DSPy Modules, we would see something like:contexts = self.weaviate_retriever(query, filters={"speaker": "Omar Khattab"})
Additional Comments
In the future we may also want to interface a filter only without a search query. For example, if we want to see the most recent 2 podcasts without any sorting based on relevance to a query.