run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
35.36k stars 4.98k forks source link

[Documentation]: OpenSearch Metadata Filtering doesn't work #15166

Open sad-mathematician opened 1 month ago

sad-mathematician commented 1 month ago

Documentation Issue Description

The documentation page contains code for indexing and querying using an OpenSearch database. The section of the documentation that shows querying with metadata filtering doesn't work though. Specifically, this block throws an error:

# Create a query engine that only searches certain footnotes.
footnote_query_engine = index.as_query_engine(
    filters=MetadataFilters(
        filters=[
            ExactMatchFilter(
                key="term", value='{"metadata.is_footnote": "true"}'
            ),
            ExactMatchFilter(
                key="query_string",
                value='{"query": "content: space AND content: lisp"}',
            ),
        ]
    )
)

res = footnote_query_engine.query(
    "What did the author about space aliens and lisp?"
)
res.response

The error thrown is:

RequestError: RequestError(400, 'x_content_parse_exception', '[term] query malformed, no start_object after query name')

There is some issue in the way the opensearch client is being queried.

Documentation Link

https://docs.llamaindex.ai/en/stable/examples/vector_stores/OpensearchDemo/

dosubot[bot] commented 1 month ago

To fix the error in the provided code block for querying with metadata filtering in an OpenSearch database, you need to ensure that the filters are correctly formatted according to OpenSearch's query DSL. The ExactMatchFilter should be used to match specific terms, not to pass JSON strings directly.

Here is the corrected code block:

# Create a query engine that only searches certain footnotes.
footnote_query_engine = index.as_query_engine(
    filters=MetadataFilters(
        filters=[
            ExactMatchFilter(
                key="metadata.is_footnote", value="true"
            ),
            ExactMatchFilter(
                key="content", value="space"
            ),
            ExactMatchFilter(
                key="content", value="lisp"
            ),
        ]
    )
)

res = footnote_query_engine.query(
    "What did the author say about space aliens and lisp?"
)
res.response

In this corrected version, the ExactMatchFilter is used to match specific fields and values directly, rather than passing JSON strings. This should resolve the RequestError related to the malformed query.

Make sure that the metadata fields (metadata.is_footnote, content) exist in your indexed documents and are correctly formatted.

For more details on how ExactMatchFilter translates to OpenSearch's query DSL, you can refer to the documentation on metadata filtering and ensure that your filters are structured correctly [1][2].

To continue talking to Dosu, mention @dosu.

sad-mathematician commented 1 month ago

@dosu the updated snippet throws the same error: RequestError: RequestError(400, 'x_content_parse_exception', '[metadata.is_footnote] query malformed, no start_object after query name')

dosubot[bot] commented 1 month ago

To resolve the RequestError: RequestError(400, 'x_content_parse_exception', '[metadata.is_footnote] query malformed, no start_object after query name') error, ensure that the metadata filter is correctly formatted as a JSON object. The error suggests that the query is not properly structured.

Here is the correct way to add a metadata filter to the query engine:

from llama_index.core.vector_stores import MetadataFilter, MetadataFilters

footnote_query_engine = index.as_query_engine(
    filters=MetadataFilters(
        filters=[MetadataFilter(key="metadata.is_footnote", value="true")]
    )
)

res = footnote_query_engine.query(
    "What did the author say about space aliens and lisp?"
)
res.response

Make sure that the MetadataFilter and MetadataFilters are correctly imported and used as shown above. The filters parameter should be a list of MetadataFilter objects, each specifying a key-value pair for the metadata you want to filter by.

Additionally, ensure that the _parse_filters function in your OpenSearch vector store implementation correctly parses the filters. Here is an example of how the _parse_filters function should look:

def _parse_filters(filters: Optional[MetadataFilters]) -> Any:
    pre_filter = []
    if filters is not None:
        for f in filters.legacy_filters():
            pre_filter.append({f.key: json.loads(str(f.value))})

    return pre_filter

This function converts the filters into a format that OpenSearch can understand, ensuring that the metadata filters are correctly applied in your queries [1][2].

To continue talking to Dosu, mention @dosu.

GeorgeDittmar commented 3 weeks ago

Running into this same issue as well with trying to filter metadata and throws exact error. @dosu are you sure this is right?

GeorgeDittmar commented 3 weeks ago

@dosu The examples given do not appear to work. Is there anyway to view the actual query object thats sent to Opensearch?