[X] I added a very descriptive title to this issue.
[X] I searched the LangChain documentation with the integrated search.
[X] I used the GitHub search to find a similar question and didn't find it.
[X] I am sure that this is a bug in LangChain rather than my code.
[X] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
Example Code
from langchain.schema.documents import Document
from langchain_community.vectostores import Chroma
from langchain_community.embeddings import HuggingFaceEmbeddings
document_1 = Document(
page_content="I had chocalate chip pancakes and scrambled eggs for breakfast this morning.")
document_2 = Document(
page_content="The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees." )
document_3 = Document(
page_content="Building an exciting new project with LangChain - come check it out!" )
documents = [
document_1,
document_2,
document_3
]
ids = [str(i) for i in range(len(documents))]
embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')
vectorstore = Chroma(collection_name='test', embedding_function=embeddings, persist_directory = 'testdb')
vectorstore.add_documents(documents=documents, ids=ids)
id_to_be_updated = 2
updated_doc = Document(page_content = "This is a test document.")
vectorstore.update_documents(ids=[id_to_be_updated], documents=[updated_doc])
Error Message and Stack Trace (if applicable)
Traceback (most recent call last): Explain with Al
File "C:\Users\1956750\PycharmProjects\vectordb_crud\adhoc.py", line 62, in
vectorstore.update_documents (ids=[replace_id], documents=[Document (page_content="Raspberry pi is a microprocessor",) ])
File "C:\Users\1956750\Pycharm Projects\vectordb_crud.venv\lib\site-packages\langchain_community\vectorstores\chroma.py", line 774, in update_documents
self._collection.update(
File "C:\Users\1956750\PycharmProjects\vectordb_crud.venv\lib\site-packages\chromadb\api\models\Collection.py", line 259, in update
) = self._validate_and_prepare_update_request(
File "C:\Users\1956750\PycharmProjects\vectordb_crud.venv\lib\site-packages\chromadb\api\models\CollectionCommon.py", line 480, in _validate_and_prepare_update_request
) = self._validate_embedding_set(
File "C:\Users\1956750\Pycharm Projects\vectordb_crud.venv\lib\site-packages\chromadb\api\models\Collection Common.py", line 182, in _validate_embedding_set
validate_metadatas (maybe_cast_one_to_many_metadata (metadatas))
File "C:\Users\1956750\Pycharm Projects\vectordb_crud.venv\lib\site-packages\chromadb\api\types.py", line 336, in validate_metadatas
validate metadata (metadata)
File "C:\Users\1956750\Pycharm Projects\vectordb_crud.venv\lib\site-packages\chromadb\api\types.py", line 288, in validate_metadata
raise ValueError(
ValueError: Expected metadata to be a non-empty dict, got 0 metadata attributes
Description
While trying to update the documents using update_documents method to an already existing chroma collection I'm facing ValueError due to the absence of metadata parameter(which is optional) in the Document object used. When I followed the error stack trace I was able to identify that the issue was occured due to the creation of empty metadata list even when metadata argument was not supplied. This empty list object causes failure of validate_metadata check in the chromadb library. The check expects when the metadata is not passed the argument to be None or a null object. In this case, an empty list is being created and this breaks the check. A workaround I followed was to provide some random metadata and it updated the documents as expected. But I believe that this behavior was not intended by Chroma developers as they gave the flexibility of not providing the metadata in their checks.
Checked other resources
Example Code
Error Message and Stack Trace (if applicable)
Traceback (most recent call last): Explain with Al File "C:\Users\1956750\PycharmProjects\vectordb_crud\adhoc.py", line 62, in
vectorstore.update_documents (ids=[replace_id], documents=[Document (page_content="Raspberry pi is a microprocessor",) ])
File "C:\Users\1956750\Pycharm Projects\vectordb_crud.venv\lib\site-packages\langchain_community\vectorstores\chroma.py", line 774, in update_documents
self._collection.update(
File "C:\Users\1956750\PycharmProjects\vectordb_crud.venv\lib\site-packages\chromadb\api\models\Collection.py", line 259, in update
) = self._validate_and_prepare_update_request(
File "C:\Users\1956750\PycharmProjects\vectordb_crud.venv\lib\site-packages\chromadb\api\models\CollectionCommon.py", line 480, in _validate_and_prepare_update_request
) = self._validate_embedding_set(
File "C:\Users\1956750\Pycharm Projects\vectordb_crud.venv\lib\site-packages\chromadb\api\models\Collection Common.py", line 182, in _validate_embedding_set
validate_metadatas (maybe_cast_one_to_many_metadata (metadatas))
File "C:\Users\1956750\Pycharm Projects\vectordb_crud.venv\lib\site-packages\chromadb\api\types.py", line 336, in validate_metadatas
validate metadata (metadata)
File "C:\Users\1956750\Pycharm Projects\vectordb_crud.venv\lib\site-packages\chromadb\api\types.py", line 288, in validate_metadata
raise ValueError(
ValueError: Expected metadata to be a non-empty dict, got 0 metadata attributes
Description
While trying to update the documents using update_documents method to an already existing chroma collection I'm facing ValueError due to the absence of metadata parameter(which is optional) in the Document object used. When I followed the error stack trace I was able to identify that the issue was occured due to the creation of empty metadata list even when metadata argument was not supplied. This empty list object causes failure of validate_metadata check in the chromadb library. The check expects when the metadata is not passed the argument to be None or a null object. In this case, an empty list is being created and this breaks the check. A workaround I followed was to provide some random metadata and it updated the documents as expected. But I believe that this behavior was not intended by Chroma developers as they gave the flexibility of not providing the metadata in their checks.
System Info
System Information