langchain-ai / langchain-weaviate

MIT License
35 stars 15 forks source link

Explanation: RAG and filter #190

Closed zymbuzz closed 4 months ago

zymbuzz commented 5 months ago

Hello, thanks a lot for this project.

I am currently exploring the functionality of the package. After reading the documentation, I was not sure how to pass the filter when doing RAG.

It is documented how to pass the filter for the similarity_search function but not when doing the RAG. I would appreciate your help on this.

hsm207 commented 4 months ago

According to the documentation , you can pass filters to similarity_search:

from weaviate.classes.query import Filter

for filter_str in ["blah.txt", "state_of_the_union.txt"]:
    search_filter = Filter.by_property("source").equal(filter_str)
    filtered_search_results = db.similarity_search(query, filters=search_filter)
    print(len(filtered_search_results))
    if filter_str == "state_of_the_union.txt":
        assert len(filtered_search_results) > 0  # There should be at least one result
    else:
        assert len(filtered_search_results) == 0  # There should be no results

I'm not sure what you mean with "filter when doing RAG" vs "not when doing the RAG". Could you please clarify?

zymbuzz commented 4 months ago

That's true that using similarity search, it is well documented how to pass the filter. What was unclear was how to add a filter in the retriever. I think I found the solution by doing the following:

search_filter = (Filter.by_property("year").less_than(2002) & Filter.by_property("year").greater_than(2000)) retriever = mydb.as_retriever(search_kwargs = {"filters": search_filter, "k": k, "alpha": alpha})

Thanks a lot for your response

hsm207 commented 4 months ago

Thanks for clarifying.

The docs for the Retriever backed by a Vector Store could clarify that the generic interface for passing a filter and the exact filter will be vector store specific.

Since this project is about the VectorStore for Weaviate onlr, I'd hand this over to the langchain team for their consideration. FYI @efriis