[X] I added a very descriptive title to this issue.
[X] I searched the LangChain documentation with the integrated search.
[X] I used the GitHub search to find a similar question and didn't find it.
[X] I am sure that this is a bug in LangChain rather than my code.
[X] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
Example Code
from langchain_community.vectorstores import PGVector
from langchain_core.documents import Document
from langchain_openai import OpenAIEmbeddings
from langchain.chains.query_constructor.base import AttributeInfo
from langchain.retrievers.self_query.base import SelfQueryRetriever
from langchain_openai import OpenAI
import os
collection = "example_collection"
embeddings = OpenAIEmbeddings()
def load_example_docs(search_text):
docs = [
Document(
page_content="A bunch of scientists bring back dinosaurs and mayhem breaks loose",
metadata={"year": 1993, "rating": 7.7, "genre": "science fiction"},
),
Document(
page_content="Leo DiCaprio gets lost in a dream within a dream within a dream within a ...",
metadata={"year": 2010, "director": "Christopher Nolan", "rating": 8.2},
),
Document(
page_content="A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea",
metadata={"year": 2006, "director": "Satoshi Kon", "rating": 8.6},
),
Document(
page_content="A bunch of normal-sized women are supremely wholesome and some men pine after them",
metadata={"year": 2019, "director": "Greta Gerwig", "rating": 8.3},
),
Document(
page_content="Toys come alive and have a blast doing so",
metadata={"year": 1995, "genre": "animated", "director": "Andrei Tarkovsky"},
),
Document(
page_content="Three men walk into the Zone, three men walk out of the Zone",
metadata={
"year": 1979,
"director": "Andrei Tarkovsky",
"genre": "science fiction",
"rating": 9.9,
},
),
]
vectorstore = PGVector.from_documents(
docs,
embeddings,
collection_name=collection
)
metadata_field_info = [
AttributeInfo(
name="genre",
description="The genre of the movie",
type="string or list[string]",
),
AttributeInfo(
name="year",
description="The year the movie was released",
type="integer",
),
AttributeInfo(
name="director",
description="The name of the movie director",
type="string",
),
AttributeInfo(
name="rating", description="A 1-10 rating for the movie", type="float"
),
]
document_content_description = "Brief summary of a movie"
llm = OpenAI(temperature=0)
retriever = SelfQueryRetriever.from_llm(
llm, vectorstore, document_content_description, metadata_field_info, verbose=True
)
invoke = retriever.invoke(search_text)
print(invoke)
#example 1
load_example_docs("What's a movie that's all about toys released in 1995 of genre animated and directed by Andrei Tarkovsky")
#example 2
load_example_docs("Has Greta Gerwig directed any movies about women")
#example 3
load_example_docs("I want to watch a movie rated higher than 8.5")
#example 4
load_example_docs("What's a highly rated (above 8.5) science fiction film?")
#example 5
load_example_docs("What's a movie after 1990 but before 2005 that's all about toys, and preferably is animated")
Error Message and Stack Trace (if applicable)
No response
Description
SelfQueryRetriever returns empty result for composite filter with query. In the above code, for example 1 - the llm returns the filter and arguments correctly. Here is the output from the llm
But the SelfQueryRetriever returns empty result even though the Document 5 exactly matches the filter and query. The example - 5 also is not returning the correct document. The code added here is from the langchain documentation https://python.langchain.com/v0.1/docs/integrations/retrievers/self_query/pgvector_self_query/. The only change that is made here is I have added "director": "Andrei Tarkovsky" as metadata to Document 5.
Checked other resources
Example Code
Error Message and Stack Trace (if applicable)
No response
Description
SelfQueryRetriever returns empty result for composite filter with query. In the above code, for example 1 - the llm returns the filter and arguments correctly. Here is the output from the llm
But the SelfQueryRetriever returns empty result even though the Document 5 exactly matches the filter and query. The example - 5 also is not returning the correct document. The code added here is from the langchain documentation https://python.langchain.com/v0.1/docs/integrations/retrievers/self_query/pgvector_self_query/. The only change that is made here is I have added "director": "Andrei Tarkovsky" as metadata to Document 5.
System Info
langchain==0.1.20 langchain-community==0.0.38 langchain-core==0.1.52 langchain-openai==0.1.6
Platform - ubuntu