run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.44k stars 5.2k forks source link

[Bug]: ChromaDb Integration is Broken after recent ChromaDb Update #16776

Open Georgehe4 opened 3 hours ago

Georgehe4 commented 3 hours ago

Bug Description

Reproducible using colab code at https://docs.llamaindex.ai/en/stable/examples/vector_stores/ChromaIndexDemo/

Breaking change in ChromaDb: https://github.com/chroma-core/chroma/pull/2899

Issue: We're now no longer passing the default into ChromaDb's 'where' handler: https://github.com/run-llama/llama_index/blob/35a13b96a61bbbbab026e1b8c5465d10dff0a759/llama-index-integrations/vector_stores/llama-index-vector-stores-chroma/llama_index/vector_stores/chroma/base.py#L258

Version

0.11

Steps to Reproduce

Reproducible using colab code at https://docs.llamaindex.ai/en/stable/examples/vector_stores/ChromaIndexDemo/

Relevant Logs/Tracbacks

ValueError                                Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/chromadb/api/models/CollectionCommon.py](https://localhost:8080/#) in wrapper(self, *args, **kwargs)
     89             try:
---> 90                 return func(self, *args, **kwargs)
     91             except Exception as e:

21 frames
[/usr/local/lib/python3.10/dist-packages/chromadb/api/models/CollectionCommon.py](https://localhost:8080/#) in _validate_and_prepare_query_request(self, query_embeddings, query_texts, query_images, query_uris, n_results, where, where_document, include)
    293         validate_base_record_set(record_set=query_records)
--> 294         validate_filter_set(filter_set=filters)
    295         validate_include(include=include)

[/usr/local/lib/python3.10/dist-packages/chromadb/api/types.py](https://localhost:8080/#) in validate_filter_set(filter_set)
    338     if filter_set["where"] is not None:
--> 339         validate_where(filter_set["where"])
    340     if filter_set["where_document"] is not None:

[/usr/local/lib/python3.10/dist-packages/chromadb/api/types.py](https://localhost:8080/#) in validate_where(where)
    597     if len(where) != 1:
--> 598         raise ValueError(f"Expected where to have exactly one operator, got {where}")
    599     for key, value in where.items():

ValueError: Expected where to have exactly one operator, got {}

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
[<ipython-input-13-80bd16794fec>](https://localhost:8080/#) in <cell line: 23>()
     21 # Query Data from the persisted index
     22 query_engine = index.as_query_engine()
---> 23 response = query_engine.query("What did the author do growing up?")
     24 display(Markdown(f"<b>{response}</b>"))

[/usr/local/lib/python3.10/dist-packages/llama_index/core/instrumentation/dispatcher.py](https://localhost:8080/#) in wrapper(func, instance, args, kwargs)
    309 
    310             try:
--> 311                 result = func(*args, **kwargs)
    312                 if isinstance(result, asyncio.Future):
    313                     # If the result is a Future, wrap it

[/usr/local/lib/python3.10/dist-packages/llama_index/core/base/base_query_engine.py](https://localhost:8080/#) in query(self, str_or_query_bundle)
     50             if isinstance(str_or_query_bundle, str):
     51                 str_or_query_bundle = QueryBundle(str_or_query_bundle)
---> 52             query_result = self._query(str_or_query_bundle)
     53         dispatcher.event(
     54             QueryEndEvent(query=str_or_query_bundle, response=query_result)

[/usr/local/lib/python3.10/dist-packages/llama_index/core/instrumentation/dispatcher.py](https://localhost:8080/#) in wrapper(func, instance, args, kwargs)
    309 
    310             try:
--> 311                 result = func(*args, **kwargs)
    312                 if isinstance(result, asyncio.Future):
    313                     # If the result is a Future, wrap it

[/usr/local/lib/python3.10/dist-packages/llama_index/core/query_engine/retriever_query_engine.py](https://localhost:8080/#) in _query(self, query_bundle)
    176             CBEventType.QUERY, payload={EventPayload.QUERY_STR: query_bundle.query_str}
    177         ) as query_event:
--> 178             nodes = self.retrieve(query_bundle)
    179             response = self._response_synthesizer.synthesize(
    180                 query=query_bundle,

[/usr/local/lib/python3.10/dist-packages/llama_index/core/query_engine/retriever_query_engine.py](https://localhost:8080/#) in retrieve(self, query_bundle)
    131 
    132     def retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
--> 133         nodes = self._retriever.retrieve(query_bundle)
    134         return self._apply_node_postprocessors(nodes, query_bundle=query_bundle)
    135 

[/usr/local/lib/python3.10/dist-packages/llama_index/core/instrumentation/dispatcher.py](https://localhost:8080/#) in wrapper(func, instance, args, kwargs)
    309 
    310             try:
--> 311                 result = func(*args, **kwargs)
    312                 if isinstance(result, asyncio.Future):
    313                     # If the result is a Future, wrap it

[/usr/local/lib/python3.10/dist-packages/llama_index/core/base/base_retriever.py](https://localhost:8080/#) in retrieve(self, str_or_query_bundle)
    243                 payload={EventPayload.QUERY_STR: query_bundle.query_str},
    244             ) as retrieve_event:
--> 245                 nodes = self._retrieve(query_bundle)
    246                 nodes = self._handle_recursive_retrieval(query_bundle, nodes)
    247                 retrieve_event.on_end(

[/usr/local/lib/python3.10/dist-packages/llama_index/core/instrumentation/dispatcher.py](https://localhost:8080/#) in wrapper(func, instance, args, kwargs)
    309 
    310             try:
--> 311                 result = func(*args, **kwargs)
    312                 if isinstance(result, asyncio.Future):
    313                     # If the result is a Future, wrap it

[/usr/local/lib/python3.10/dist-packages/llama_index/core/indices/vector_store/retrievers/retriever.py](https://localhost:8080/#) in _retrieve(self, query_bundle)
    101                     )
    102                 )
--> 103         return self._get_nodes_with_embeddings(query_bundle)
    104 
    105     @dispatcher.span

[/usr/local/lib/python3.10/dist-packages/llama_index/core/indices/vector_store/retrievers/retriever.py](https://localhost:8080/#) in _get_nodes_with_embeddings(self, query_bundle_with_embeddings)
    178     ) -> List[NodeWithScore]:
    179         query = self._build_vector_store_query(query_bundle_with_embeddings)
--> 180         query_result = self._vector_store.query(query, **self._kwargs)
    181         return self._build_node_list_from_query_result(query_result)
    182 

[/usr/local/lib/python3.10/dist-packages/llama_index/vector_stores/chroma/base.py](https://localhost:8080/#) in query(self, query, **kwargs)
    369             return self._get(limit=query.similarity_top_k, where=where, **kwargs)
    370 
--> 371         return self._query(
    372             query_embeddings=query.query_embedding,
    373             n_results=query.similarity_top_k,

[/usr/local/lib/python3.10/dist-packages/llama_index/vector_stores/chroma/base.py](https://localhost:8080/#) in _query(self, query_embeddings, n_results, where, **kwargs)
    379         self, query_embeddings: List["float"], n_results: int, where: dict, **kwargs
    380     ) -> VectorStoreQueryResult:
--> 381         results = self._collection.query(
    382             query_embeddings=query_embeddings,
    383             n_results=n_results,

[/usr/local/lib/python3.10/dist-packages/chromadb/api/models/Collection.py](https://localhost:8080/#) in query(self, query_embeddings, query_texts, query_images, query_uris, n_results, where, where_document, include)
    208         """
    209 
--> 210         query_request = self._validate_and_prepare_query_request(
    211             query_embeddings=query_embeddings,
    212             query_texts=query_texts,

[/usr/local/lib/python3.10/dist-packages/chromadb/api/models/CollectionCommon.py](https://localhost:8080/#) in wrapper(self, *args, **kwargs)
     91             except Exception as e:
     92                 msg = f"{str(e)} in {name}."
---> 93                 raise type(e)(msg).with_traceback(e.__traceback__)
     94 
     95         return wrapper

[/usr/local/lib/python3.10/dist-packages/chromadb/api/models/CollectionCommon.py](https://localhost:8080/#) in wrapper(self, *args, **kwargs)
     88         def wrapper(self: Any, *args: Any, **kwargs: Any) -> T:
     89             try:
---> 90                 return func(self, *args, **kwargs)
     91             except Exception as e:
     92                 msg = f"{str(e)} in {name}."

[/usr/local/lib/python3.10/dist-packages/chromadb/api/models/CollectionCommon.py](https://localhost:8080/#) in _validate_and_prepare_query_request(self, query_embeddings, query_texts, query_images, query_uris, n_results, where, where_document, include)
    292         # Validate
    293         validate_base_record_set(record_set=query_records)
--> 294         validate_filter_set(filter_set=filters)
    295         validate_include(include=include)
    296         validate_n_results(n_results=n_results)

[/usr/local/lib/python3.10/dist-packages/chromadb/api/types.py](https://localhost:8080/#) in validate_filter_set(filter_set)
    337 def validate_filter_set(filter_set: FilterSet) -> None:
    338     if filter_set["where"] is not None:
--> 339         validate_where(filter_set["where"])
    340     if filter_set["where_document"] is not None:
    341         validate_where_document(filter_set["where_document"])

[/usr/local/lib/python3.10/dist-packages/chromadb/api/types.py](https://localhost:8080/#) in validate_where(where)
    596         raise ValueError(f"Expected where to be a dict, got {where}")
    597     if len(where) != 1:
--> 598         raise ValueError(f"Expected where to have exactly one operator, got {where}")
    599     for key, value in where.items():
    600         if not isinstance(key, str):

ValueError: Expected where to have exactly one operator, got {} in query.
dosubot[bot] commented 3 hours ago

The issue you're encountering is due to a change in ChromaDb's handling of the 'where' parameter. The error message indicates that the 'where' parameter is expected to have exactly one operator, but it is currently being passed as an empty dictionary {}. This is likely because the default value for the 'where' parameter is not being set correctly in the LlamaIndex integration with ChromaDb.

To resolve this, you need to ensure that the 'where' parameter is properly initialized with a valid operator when no filters are provided. In the LlamaIndex codebase, the query method should set the where parameter to a default value that includes a valid operator, rather than an empty dictionary. You can modify the code to include a default operator, such as "$and": [], to satisfy the new requirement in ChromaDb.

Here's a potential fix in the query method:

def query(self, query: VectorStoreQuery, **kwargs: Any) -> VectorStoreQueryResult:
    ...
    if query.filters is not None:
        where = _to_chroma_filter(query.filters)
    else:
        where = kwargs.pop("where", {"$and": []})  # Set a default operator
    ...

This change ensures that the 'where' parameter always contains a valid operator, even when no specific filters are applied.

To continue talking to Dosu, mention @dosu.

Georgehe4 commented 2 hours ago

I think the best option would be to check if we can update ChromaDb to support empty where {} again.