run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.91k stars 5.29k forks source link

[Question]: Is there a way to filter in a query engine the same way u can filter in a normal search #16631

Open Neutel opened 1 month ago

Neutel commented 1 month ago

Question Validation

Question

In the search enpoint there is the option to filter using a ismatch that uses a regex like language(lucene). I need to do a similiar thing in the query engine but cant't find a way to do so. The filter i have on the /search endpoint is search.ismatch('/,.{groupId},./', 'groups', 'full', 'any'). where groups is a comma separated string with group ids

dosubot[bot] commented 1 month ago

Hey there, @Neutel! I'm here to help you with any bugs, questions, or contributions. Let's tackle this together while we wait for a human maintainer.

Yes, you can filter in a query engine using a method similar to the ismatch function with regex-like language (Lucene) available in the search endpoint. The OpenSearch vector store supports filter-context queries, and you can use the ExactMatchFilter to apply filters in your queries. Here's an example of how you can create a query engine that only searches certain footnotes using filters:

from llama_index.core import Document
from llama_index.core.vector_stores import MetadataFilters, ExactMatchFilter
import regex as re

# Split the text into paragraphs.
text_chunks = documents[0].text.split("\n\n")

# Create a document for each footnote
footnotes = [
    Document(
        text=chunk,
        id=documents[0].doc_id,
        metadata={"is_footnote": bool(re.search(r"^\s*\[\d+\]\s*", chunk))},
    )
    for chunk in text_chunks
    if bool(re.search(r"^\s*\[\d+\]\s*", chunk))
]

# Insert the footnotes into the index
for f in footnotes:
    index.insert(f)

# Create a query engine that only searches certain footnotes.
footnote_query_engine = index.as_query_engine(
    filters=MetadataFilters(
        filters=[
            ExactMatchFilter(
                key="term", value='{"metadata.is_footnote": "true"}'
            ),
            ExactMatchFilter(
                key="query_string",
                value='{"query": "content: space AND content: lisp"}',
            ),
        ]
    )
)

res = footnote_query_engine.query(
    "What did the author about space aliens and lisp?"
)
res.response

This example demonstrates how to use filters to narrow down search results based on specific metadata and query conditions [1].

To continue talking to Dosu, mention @dosu.

Neutel commented 1 month ago

@dosu the field groups is a query separeated string so group1,group2,group3 and the filter i need is on that matches documents if it a certain group is in the field so i thing ExactMatchFilter doesn't quite work

dosubot[bot] commented 1 month ago

To filter documents in a query engine where a field contains a comma-separated string and you want to match if a certain group is present, you can use the MatchAny filter operator. This operator allows you to match any of the values in a list against the field. Here's how you can implement it:

conditions.append(
    FieldCondition(
        key=subfilter.key,
        match=MatchAny(any=values),
    )
)

In this code snippet, subfilter.key is the field you want to filter on, and values is a list of values you want to match against the comma-separated string field. This approach is suitable for scenarios where ExactMatchFilter does not work because it allows for matching any of the specified values within the field [1].

To continue talking to Dosu, mention @dosu.