mayooear / gpt4-pdf-chatbot-langchain

GPT4 & LangChain Chatbot for large PDF docs
https://www.youtube.com/watch?v=ih9PBGVVOO4
14.73k stars 3k forks source link

Add a filter/filters on elasticsearch vectorstore #416

Closed LaurensTsestigh closed 8 months ago

LaurensTsestigh commented 8 months ago

Is there a way to add a filter/multiple filters to the vectorstore retriever? As we don't want to be searching the vectorstore for embeddings that aren't linked to a file we are chatting with, we should need to add a fileId filter when querying elasticsearch.

I tried it like this but I am not able to retrieve documents:

const chain = ConversationalRetrievalQAChain.fromLLM(
    model,
    vectorstore.asRetriever({
      filter: { fileId: 'b122f75b-9b22-41d4-91f2-90a7d038f5bf' }
    }),
    {
      verbose: true,
      qaTemplate: QA_TEMPLATE,
      questionGeneratorTemplate: CONDENSE_TEMPLATE,
      returnSourceDocuments: true //The number of source documents returned is 4 by default
    }
  )

Thanks in advance!

dosubot[bot] commented 8 months ago

🤖

Hello,

From the context provided, it seems like the gpt4-pdf-chatbot-langchain codebase is currently using Pinecone as the vector store, not Elasticsearch. Therefore, the filter you're trying to apply to Elasticsearch might not work as expected.

However, if you're planning to switch to Elasticsearch as your vector store, you can certainly add filters to your queries. The filter you're trying to apply seems correct in principle. However, the issue might be with the way the fileId is associated with each document in the vector store.

Unfortunately, the context provided doesn't include information on how the fileId is associated with each document in the vector store. If the fileId is not properly associated with each document, the filter might not be able to retrieve any documents.

Here's a general example of how you might add a filter to an Elasticsearch query:

const { body } = await client.search({
  index: 'my-index',
  body: {
    query: {
      bool: {
        must: { match_all: {} },
        filter: { term: { fileId: 'b122f75b-9b22-41d4-91f2-90a7d038f5bf' } }
      }
    }
  }
});

In this example, the filter clause is used to filter the documents that have a fileId that matches 'b122f75b-9b22-41d4-91f2-90a7d038f5bf'.

Please ensure that the fileId is correctly associated with each document in your vector store. If you're still having trouble, could you provide more information about how you're associating the fileId with each document? This will help me provide a more accurate solution.

Best regards, Dosu

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

LaurensTsestigh commented 8 months ago

@dosu-bot Well adding filters to an elasticsearch query is something we could do already. The thing is the ConversationalRetrievalQAChain uses the vectorstore as a retriever and does the querying for us. So I need to be able to tell the vectorstore that it should filter on fileId or other filters when reaching out to elasticsearch.

dosubot[bot] commented 8 months ago

🤖

Hello,

From your question, it seems you want to add a fileId filter to the Elasticsearch query within the vectorstore.asRetriever() method. However, without the actual implementation of vectorstore.asRetriever(), it's challenging to provide a specific solution.

In general, to add a filter to an Elasticsearch query, you would include a filter clause in the bool query. Here's a basic example:

{
  "query": {
    "bool": {
      "must": {
        "match_all": {}
      },
      "filter": {
        "term": {
          "fileId": "<your_file_id>"
        }
      }
    }
  }
}

This query would return only the documents that have the specified fileId.

If vectorstore.asRetriever() is a method that you've defined, you would need to modify it to include this filter when it constructs the Elasticsearch query. If it's a method from a library, you would need to check the library's documentation to see if it supports adding filters to the query.

To provide a more specific solution, could you please share the implementation of vectorstore.asRetriever() or the relevant Elasticsearch query code? This would help me understand how the method is currently working and how you might be able to add the fileId filter.

Best, Dosu

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

LaurensTsestigh commented 8 months ago

We had a little mistake in the mapping on elasticsearch. Our metadata was typed as nested, this was causing problems.