run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.85k stars 5.28k forks source link

[Bug]: MetadataFilters FilterOperator.IN does not work for Chromadb #12772

Closed bhomass closed 1 month ago

bhomass commented 7 months ago

Bug Description

I created a vectorstore using Chromadb, and created a retriever from that. What I want to do is to retrieve both from the semanatic search and a meta filter. The filter works for EQ, but blows up for IN. Is IN suppose to work? I tried _to_chroma_filter(filters) call, but it seems that call has been deprecated.

There is nothing wrong with my documents or nodes because as I said the EQ filter does work.

Version

0.10.28

Steps to Reproduce

vector_store = ChromaVectorStore(chroma_collection=chroma_collection) storage_context = StorageContext.from_defaults(vector_store=vector_store) sentence_index = VectorStoreIndex(nodes, storage_context=storage_context)

desired_ids = ['10000032', '10000764']

filters = MetadataFilters( filters=[ MetadataFilter(key="subject_id", operator=FilterOperator.IN, value=desired_ids), ], ) retriever = sentence_index.as_retriever(filters=filters) retriever.retrieve("Find all patients")

the error is Unexpected exception formatting exception. Falling back to standard exception

Relevant Logs/Tracbacks

Traceback (most recent call last):
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3505, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "/tmp/ipykernel_33278/3927084781.py", line 14, in <module>
    retriever.retrieve("Find all patients")
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py", line 102, in wrapper
    def set_current_span_id(self, value: str):
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py", line 77, in span_drop
    span_handlers: List[BaseSpanHandler] = [],
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/llama_index/core/instrumentation/span_handlers/base.py", line 45, in span_drop
    def class_name(cls) -> str:
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/llama_index/core/instrumentation/span_handlers/null.py", line 33, in prepare_to_drop_span
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py", line 100, in wrapper
    return None
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/llama_index/core/base/base_retriever.py", line 238, in retrieve
    query_bundle = str_or_query_bundle
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py", line 102, in wrapper
    def set_current_span_id(self, value: str):
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py", line 77, in span_drop
    span_handlers: List[BaseSpanHandler] = [],
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/llama_index/core/instrumentation/span_handlers/base.py", line 45, in span_drop
    def class_name(cls) -> str:
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/llama_index/core/instrumentation/span_handlers/null.py", line 33, in prepare_to_drop_span
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py", line 100, in wrapper
    return None
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/llama_index/core/indices/vector_store/retrievers/retriever.py", line 101, in _retrieve
    return self._get_nodes_with_embeddings(query_bundle)
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/llama_index/core/indices/vector_store/retrievers/retriever.py", line 176, in _get_nodes_with_embeddings
    query = self._build_vector_store_query(query_bundle_with_embeddings)
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/llama_index/vector_stores/chroma/base.py", line 290, in query
    where = _to_chroma_filter(query.filters)
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/llama_index/vector_stores/chroma/base.py", line 69, in _to_chroma_filter
    _transform_chroma_filter_operator(
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/llama_index/vector_stores/chroma/base.py", line 52, in _transform_chroma_filter_operator
    raise ValueError(f"Filter operator {operator} not supported")
ValueError: Filter operator in not supported

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 2102, in showtraceback
    stb = self.InteractiveTB.structured_traceback(
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/IPython/core/ultratb.py", line 1310, in structured_traceback
    return FormattedTB.structured_traceback(
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/IPython/core/ultratb.py", line 1199, in structured_traceback
    return VerboseTB.structured_traceback(
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/IPython/core/ultratb.py", line 1052, in structured_traceback
    formatted_exception = self.format_exception_as_a_whole(etype, evalue, etb, number_of_lines_of_context,
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/IPython/core/ultratb.py", line 978, in format_exception_as_a_whole
    frames.append(self.format_record(record))
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/IPython/core/ultratb.py", line 878, in format_record
    frame_info.lines, Colors, self.has_colors, lvals
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/IPython/core/ultratb.py", line 712, in lines
    return self._sd.lines
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/stack_data/utils.py", line 144, in cached_property_wrapper
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/stack_data/core.py", line 734, in lines
    pieces = self.included_pieces
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/stack_data/utils.py", line 144, in cached_property_wrapper
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/stack_data/core.py", line 681, in included_pieces
    pos = scope_pieces.index(self.executing_piece)
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/stack_data/utils.py", line 144, in cached_property_wrapper
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/stack_data/core.py", line 660, in executing_piece
    return only(
  File "/data/anaconda3/envs/llamaindex10/lib/python3.10/site-packages/executing/executing.py", line 116, in only
    raise NotOneValueFound('Expected one value, found 0')
executing.executing.NotOneValueFound: Expected one value, found 0
dosubot[bot] commented 7 months ago

The error you're encountering with the FilterOperator.IN not working in Chromadb is because the _transform_chroma_filter_operator function in the LlamaIndex ChromaDB integration does not support the IN operator. This function translates standard metadata filter operators to ChromaDB specific specifications, and it currently lacks support for the IN operator, leading to the ValueError you're seeing.

To address this issue, you could modify the _transform_chroma_filter_operator function to include support for the IN operator, assuming you're familiar with ChromaDB's query language. Here's an example modification:

def _transform_chroma_filter_operator(operator: str) -> str:
    if operator == "IN":
        return "$in"
    # Include other operators and their translations

This modification assumes ChromaDB uses a syntax similar to MongoDB for the IN operator. You'll also need to ensure your code correctly handles cases where filter.value is a list of values, as the IN operator applies to multiple values.

Keep in mind, this is a workaround. For a permanent solution, consider reaching out to the LlamaIndex team to request official support for the IN operator in their ChromaDB integration.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a šŸ‘ if this is helpful and šŸ‘Ž if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
logan-markewich commented 7 months ago

As the error states, it is indeed not supported

If chroma does support this, it should be added here https://github.com/run-llama/llama_index/blob/f1124cffd4ba59ebc58839024fa8aaefef679214/llama-index-integrations/vector_stores/llama-index-vector-stores-chroma/llama_index/vector_stores/chroma/base.py#L37

dsanr commented 6 months ago

@logan-markewich It seems the 'IN' operator is supported in chromadb. https://docs.trychroma.com/guides#using-inclusion-operators-(-and-) Could you please check this?

dsanr commented 5 months ago

@logan-markewich This issue can be closed now https://github.com/run-llama/llama_index/pull/14010