langchain-ai / langchain-google

MIT License
74 stars 78 forks source link

[genai] GoogleGenerativeAIEmbeddings not working with langchain_chroma.Chroma #247

Closed adamvig96 closed 1 month ago

adamvig96 commented 1 month ago

Initializing a Chroma vectorstore fails with GoogleGenerativeAIEmbeddings

Issue: GoogleGenerativeAIEmbeddings.embed_documents returns List[proto.marshal.collections.repeated.Repeated[float]] instead of List[List[float]]. Chroma.from_texts expects each embedding to be a list, therefore validation fails.

Sample code to reproduce error:

from langchain_chroma import Chroma
from langchain_google_genai import GoogleGenerativeAIEmbeddings

texts = ["hi", "hello", "How are you"]
embedding = GoogleGenerativeAIEmbeddings(model="models/text-embedding-004")
Chroma.from_texts(texts=texts, embedding=embedding)

Error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-67-2c69b857073b>](https://localhost:8080/#) in <cell line: 5>()
      3 
      4 texts = ["hi", "hello", "How are you"]
----> 5 Chroma.from_texts(texts=texts, embedding=GoogleGenerativeAIEmbeddings(model="models/text-embedding-004"))

4 frames
[/usr/local/lib/python3.10/dist-packages/langchain_chroma/vectorstores.py](https://localhost:8080/#) in from_texts(cls, texts, embedding, metadatas, ids, collection_name, persist_directory, client_settings, client, collection_metadata, **kwargs)
    753                 documents=texts,
    754             ):
--> 755                 chroma_collection.add_texts(
    756                     texts=batch[3] if batch[3] else [],
    757                     metadatas=batch[2] if batch[2] else None,  # type: ignore

[/usr/local/lib/python3.10/dist-packages/langchain_chroma/vectorstores.py](https://localhost:8080/#) in add_texts(self, texts, metadatas, ids, **kwargs)
    358                 )
    359         else:
--> 360             self._collection.upsert(
    361                 embeddings=embeddings,  # type: ignore
    362                 documents=texts,

[/usr/local/lib/python3.10/dist-packages/chromadb/api/models/Collection.py](https://localhost:8080/#) in upsert(self, ids, embeddings, metadatas, documents, images, uris)
    475             images,
    476             uris,
--> 477         ) = self._validate_embedding_set(
    478             ids, embeddings, metadatas, documents, images, uris
    479         )

[/usr/local/lib/python3.10/dist-packages/chromadb/api/models/Collection.py](https://localhost:8080/#) in _validate_embedding_set(self, ids, embeddings, metadatas, documents, images, uris, require_embeddings_or_data)
    545         valid_ids = validate_ids(maybe_cast_one_to_many_ids(ids))
    546         valid_embeddings = (
--> 547             validate_embeddings(
    548                 self._normalize_embeddings(maybe_cast_one_to_many_embedding(embeddings))
    549             )

[/usr/local/lib/python3.10/dist-packages/chromadb/api/types.py](https://localhost:8080/#) in validate_embeddings(embeddings)
    486         )
    487     if not all([isinstance(e, list) for e in embeddings]):
--> 488         raise ValueError(
    489             "Expected each embedding in the embeddings to be a list, got "
    490             f"{list(set([type(e).__name__ for e in embeddings]))}"

ValueError: Expected each embedding in the embeddings to be a list, got ['Repeated']