Closed HenkBourgonje closed 3 days ago
The issue with the SelfQueryRetriever
for the SupabaseVectorStore
not returning any documents, despite the filters being created correctly and the documents existing in the database, might be due to the mergeFilters
function in the SupabaseTranslator
class. Specifically, if both the defaultFilter
and generatedFilter
are empty, the function returns undefined
, which means no filter is applied, and thus no documents are retrieved.
Here is the relevant part of the code:
mergeFilters(
defaultFilter: SupabaseFilterRPCCall | SupabaseMetadata | undefined,
generatedFilter: SupabaseFilterRPCCall | undefined,
mergeType = "and"
): SupabaseFilterRPCCall | SupabaseMetadata | undefined {
if (isFilterEmpty(defaultFilter) && isFilterEmpty(generatedFilter)) {
return undefined;
}
if (isFilterEmpty(defaultFilter) || mergeType === "replace") {
if (isFilterEmpty(generatedFilter)) {
return undefined;
}
return generatedFilter;
}
if (isFilterEmpty(generatedFilter)) {
if (mergeType === "and") {
return undefined;
}
return defaultFilter;
}
let myDefaultFilter = defaultFilter;
if (isObject(defaultFilter)) {
const { filter } = this.visitStructuredQuery(
convertObjectFilterToStructuredQuery(defaultFilter)
);
// just in case the built filter is empty somehow
if (isFilterEmpty(filter)) {
if (isFilterEmpty(generatedFilter)) {
return undefined;
}
return generatedFilter;
}
myDefaultFilter = filter;
}
// After this point, myDefaultFilter will always be SupabaseFilterRPCCall
if (mergeType === "or") {
return (rpc) => {
const defaultFlattenedParams = ProxyParamsDuplicator.getFlattenedParams(
rpc,
myDefaultFilter as SupabaseFilterRPCCall
);
const generatedFlattenedParams =
ProxyParamsDuplicator.getFlattenedParams(rpc, generatedFilter);
return rpc.or(`${defaultFlattenedParams},${generatedFlattenedParams}`);
};
} else if (mergeType === "and") {
return (rpc) =>
generatedFilter((myDefaultFilter as SupabaseFilterRPCCall)(rpc));
} else {
throw new Error("Unknown merge type");
}
}
Additionally, ensure that your environment variables for Supabase are correctly set, as missing these can also cause issues. Here is a test case that demonstrates the correct setup and usage:
test("Supabase Store Self Query Retriever Test", async () => {
const docs = [
new Document({
pageContent:
"A bunch of scientists bring back dinosaurs and mayhem breaks loose",
metadata: { year: 1993, rating: 7.7, genre: "science fiction" },
}),
new Document({
pageContent:
"Leo DiCaprio gets lost in a dream within a dream within a dream within a ...",
metadata: { year: 2010, director: "Christopher Nolan", rating: 8.2 },
}),
new Document({
pageContent:
"A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea",
metadata: { year: 2006, director: "Satoshi Kon", rating: 8.6 },
}),
new Document({
pageContent:
"A bunch of normal-sized women are supremely wholesome and some men pine after them",
metadata: {
year: 2019,
director: "Greta Gerwig",
rating: 8.3,
genre: "drama",
},
}),
new Document({
pageContent: "Toys come alive and have a blast doing so",
metadata: { year: 1995, genre: "animated" },
}),
new Document({
pageContent:
"Three men walk into the Zone, three men walk out of the Zone",
metadata: {
year: 1979,
director: "Andrei Tarkovsky",
genre: "science fiction",
rating: 9.9,
},
}),
new Document({
pageContent: "10x the previous gecs",
metadata: {
year: 2023,
title: "10000 gecs",
artist: "100 gecs",
rating: 9.9,
type: "album",
},
}),
];
const attributeInfo: AttributeInfo[] = [
{
name: "genre",
description: "The genre of the movie",
type: "string or array of strings",
},
{
name: "year",
description: "The year the movie was released",
type: "number",
},
{
name: "director",
description: "The director of the movie",
type: "string",
},
{
name: "rating",
description: "The rating of the movie (1-10)",
type: "number",
},
{
name: "length",
description: "The length of the movie in minutes",
type: "number",
},
];
if (
!process.env.SUPABASE_VECTOR_STORE_URL ||
!process.env.SUPABASE_VECTOR_STORE_PRIVATE_KEY
) {
throw new Error(
"Supabase URL or private key not set. Please set it in the .env file"
);
}
const embeddings = new OpenAIEmbeddings();
const llm = new OpenAI();
const documentContents = "Brief summary of a movie";
const client = createClient(
process.env.SUPABASE_VECTOR_STORE_URL,
process.env.SUPABASE_VECTOR_STORE_PRIVATE_KEY
);
const vectorStore = new SupabaseVectorStore(embeddings, { client });
// idempotency
const opts = { ids: docs.map((_, idx) => idx) };
await vectorStore.addDocuments(docs, opts);
const selfQueryRetriever = SelfQueryRetriever.fromLLM({
llm,
vectorStore,
documentContents,
attributeInfo,
structuredQueryTranslator: new SupabaseTranslator(),
searchParams: {
filter: (rpc: SupabaseFilter) => rpc.filter("metadata->>type", "eq", "movie"),
mergeFiltersOperator: "and",
}
});
const query1 = await selfQueryRetriever.getRelevantDocuments(
"Which movies are less than 90 minutes?"
);
console.log(query1);
expect(query1.length).toEqual(0);
const query2 = await selfQueryRetriever.getRelevantDocuments(
"Which movies are rated higher than 8.5?"
);
console.log(query2);
expect(query2.length).toEqual(3);
const query3 = await selfQueryRetriever.getRelevantDocuments(
"Which movies are directed by Greta Gerwig?"
);
console.log(query3);
expect(query3.length).toEqual(1);
});
Ensure that the filters are not empty and are correctly merged to retrieve the documents. Additionally, verify that your environment variables are correctly set and that the documents are properly added to the vector store. You can also initialize the retriever with default search parameters that apply in addition to the generated query:
const selfQueryRetriever = SelfQueryRetriever.fromLLM({
llm,
vectorStore,
documentContents,
attributeInfo,
structuredQueryTranslator: new SupabaseTranslator(),
searchParams: {
filter: (rpc: SupabaseFilter) => rpc.filter("metadata->>type", "eq", "movie"),
mergeFiltersOperator: "and",
}
});
See the official docs for more on how to construct metadata filters [1][2][3].
To continue talking to Dosu, mention @dosu.
Checked other resources
Example Code
Error Message and Stack Trace (if applicable)
No response
Description
I have a RAG chatbot that I want to answer questions about the inventory of the webshop of my client. It seemed that Self Query would be a good way of Retrieval for this use case because the questions about these products will contain meta data. I am trying to implement it using the SupabaseVectorStore but it does not seem to work as expected. For example:
This is what the metadata column looks like of a product:
As you can see, these values are defined in the
attributeInfo
array of my Self Query implementation and I expect to be able to ask my chatbot questions about it. When I ask the following question for example: "Of what product do you have more than 1000 in stock?", it correctly creates the search filter stating that stock should be greater than 1000 (as seen in the logging since verbose is set to true).How could it be, that even though the filters are created correctly and the documents exist in the db, the Retriever has never returned any documents?
System Info
linux langchain@0.2.6 @langchain/community@0.2.13 @langchain/core@0.2.9 @langchain/openai@0.2.0 node@v18.16.0