run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
33.48k stars 4.69k forks source link

[Bug]: RedisVectorStore Hard-Coded Filtration Type Ignores "OR" Condition in MetadataFilters #10702

Closed IIe4enka closed 1 month ago

IIe4enka commented 4 months ago

Bug Description

The _to_redis_filters function within the RedisVectorStore implementation is currently hard-coded to use the & operator for combining multiple metadata filters. This behavior forces all filters to be applied with an "AND" condition, ignoring any other logical condition specified by the user, such as "OR". Specifically, when attempting to apply multiple filters with an "OR" condition using the MetadataFilters class, the method still concatenates filter criteria using the & operator, leading to incorrect query behavior where it should allow for an "OR" condition between filters.

Expected Behavior: The query should return documents that match any of the specified conditions when the 'or' condition is used.

Actual Behavior: The query incorrectly applies 'AND' logic due to the hard-coded & operator, leading to no documents found for filters that should have been combined with 'OR' logic.

Version

llama-index==0.9.46

Steps to Reproduce

Initialize a RedisVectorStore with specific configuration: python

vector_store = RedisVectorStore(
    index_name="test_index",
    index_prefix="llama",
    redis_url="redis://localhost:6379",
    metadata_fields=["document_name", "splitting_type"],
)

Create an index using the configured vector_store: index = VectorStoreIndex.from_vector_store(vector_store=vector_store) Define MetadataFilters with an 'or' condition and multiple filters:

filters = MetadataFilters(
    filters=[
        MetadataFilter(key="document_name", value="corp_HR_guideline_construction"),
        MetadataFilter(key="document_name", value="test_document")
    ],
    condition='or'
)

Initialize a query_engine with the index and the defined filters:

query_engine = index.as_query_engine(
    service_context=service_context,
    similarity_top_k=3,
    filters=MetadataFilters(
        filters=filters,
        condition=FilterCondition.OR,
    )
)

Execute a query expecting to apply 'OR' logic between conditions, but 'AND' logic is applied instead, resulting in no documents found for cases that should match with "OR" and redis error message that displays an incorrect filter passed as query: query_engine.query("Test")

Relevant Logs/Tracbacks

Exception has occurred: ValueError
No docs found on index 'test_index' with prefix 'llama/vector' and filters '(@document_name:{corp_HR_guideline_construction} & @document_name:{corporate})'. * Did you originally create the index with a different prefix? * Did you index your metadata fields when you created the index?
  File "C:\Users\mtank\Coding\WebSmartAssistant\src\tests\locaL_test.py", line 112, in main
    res = await engine.aquery("What is the purpose of the HR guideline?")
  File "C:\Users\mtank\Coding\WebSmartAssistant\src\tests\locaL_test.py", line 132, in <module>
    asyncio.run(main())
ValueError: No docs found on index 'test_index' with prefix 'llama/vector' and filters '(@document_name:{corp_HR_guideline_construction} & @document_name:{corporate})'. * Did you originally create the index with a different prefix? * Did you index your metadata fields when you created the index?
dosubot[bot] commented 4 months ago

πŸ€–

Hey @IIe4enka, great to see you back! Hope you've been doing well since our last chat.

Thank you for providing the detailed information. It seems you're encountering an issue with the RedisVectorStore implementation in LlamaIndex version 0.9.46, where the & operator is hard-coded for combining multiple metadata filters. This indeed forces all filters to be applied with an "AND" condition, which is not the desired behavior when you want to use an "OR" condition.

A similar issue was reported in the past (Issue #7535) and a workaround was suggested. The workaround involves modifying the _to_redis_filters function in the RedisVectorStore class to use the | operator instead of the & operator. However, this workaround applies the "OR" operation to all filters, not just specific ones.

Here's the suggested modification:

def _to_redis_filters(metadata_filters: MetadataFilters) -> str:
    tokenizer = TokenEscaper()

    filter_strings = []
    for filter in metadata_filters.filters:
        filter_string = "@%s:{%s}" % (filter.key, tokenizer.escape(str(filter.value)))
        filter_strings.append(filter_string)

    joined_filter_strings = " | ".join(filter_strings)  # changed from " & " to " | "
    return f"({joined_filter_strings})"

Please note that this is a simple workaround and might not cover all use cases. For more complex queries, you might need to implement a more sophisticated query builder that can handle different types of logical operations.

Also, please be aware that changing the function in this way will affect all queries that use the RedisVectorStore retriever, so you should thoroughly test your application after making this change to ensure it still behaves as expected.

I hope this helps! Let me know if you have any other questions. You can find the original issue and the suggested solution here.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a πŸ‘ if this is helpful and πŸ‘Ž if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
IIe4enka commented 4 months ago

Is there a more laconic fix to this? Is it comming soon or should we build a failsafe workaround ourselves?

logan-markewich commented 4 months ago

Would love a PR to fix this. Would just need to update the function that is creating the redis filters

dosubot[bot] commented 1 month ago

Hi, @IIe4enka,

I'm helping the LlamaIndex team manage our backlog and am marking this issue as stale. From what I understand, the issue was raised regarding the RedisVectorStore implementation in LlamaIndex using the & operator to combine metadata filters, resulting in an "AND" condition and ignoring any specified "OR" condition. I provided a detailed response suggesting a workaround to modify the _to_redis_filters function in the RedisVectorStore class to use the | operator instead of the & operator. Logan-markewich expressed interest in a pull request to fix the issue, indicating that updating the function creating the redis filters would be sufficient. IIe4enka followed up with a question about a more concise fix and whether a solution is forthcoming or if they should implement a workaround themselves.

Could you please confirm if this issue is still relevant to the latest version of the LlamaIndex repository? If it is, please let the LlamaIndex team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days.

Dosu