langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
92.31k stars 14.75k forks source link

GoogleGenerativeAIEmbeddings embed_documents method returns list of Repeated Type #22411

Open ausarhuy opened 3 months ago

ausarhuy commented 3 months ago

Checked other resources

Example Code

from langchain_google_genai import GoogleGenerativeAIEmbeddings

embeddings = GoogleGenerativeAIEmbeddings(model='models/embedding-001')
vectors = embeddings.embed_documents(queries)

print(type(vectors[0]))

Error Message and Stack Trace (if applicable)

No response

Description

It returns <class 'proto.marshal.collections.repeated.Repeated'> type not List type. It might work the same as a List type but not when using it in any vectorstore.

System Info

System Information

OS: Linux OS Version: #1 SMP Thu Jan 11 04:09:03 UTC 2024 Python Version: 3.10.14 (main, May 6 2024, 19:42:50) [GCC 11.2.0]

Package Information

langchain_core: 0.2.3 langchain: 0.2.1 langchain_community: 0.2.1 langsmith: 0.1.67 langchain_google_genai: 1.0.5 langchain_text_splitters: 0.2.0 langgraph: 0.0.60

ausarhuy commented 3 months ago
from typing import List, Optional

from langchain_google_genai import GoogleGenerativeAIEmbeddings

class GeminiEmbeddings(GoogleGenerativeAIEmbeddings):
    def embed_documents(self, texts: List[str],
                        task_type: Optional[str] = None,
                        titles: Optional[List[str]] = None,
                        output_dimensionality: Optional[int] = None) -> List[List[float]]:

        embeddings = super().embed_documents(texts, task_type, titles, output_dimensionality)
        # Convert Repeated type to list type
        return list(map(list, embeddings))

This is my current quick fix.

SwastikGorai commented 3 months ago
class GeminiEmbeddings(GoogleGenerativeAIEmbeddings):
    def embed_documents(self, texts: List[str],
                        task_type: Optional[str] = None,
                        titles: Optional[List[str]] = None,
                        output_dimensionality: Optional[int] = None) -> List[List[float]]:

        embeddings = super().embed_documents(texts, task_type, titles, output_dimensionality)
        # Convert Repeated type to list type
        return list(map(list, embeddings))

It throws TypeError: GoogleGenerativeAIEmbeddings.embed_documents() takes 2 positional arguments but 5 were given.

I am using it with:

...
self.embeddings = GeminiEmbeddings(
            model="models/embedding-001")
 docsearch = Chroma.from_texts(..., embedding=self.embeddings, ...) 
...

EDIT: using only super().embed_documents(texts) works.

GMartin-dev commented 3 months ago

This returned type "Repeated" also breaks the CacheBackedEmbeddings when wrapping GoogleGenerativeAIEmbeddings.

ausarhuy commented 3 months ago

This returned type "Repeated" also breaks the CacheBackedEmbeddings when wrapping GoogleGenerativeAIEmbeddings.

Yes, they either change Repeatd to List or add Repeated to all the vectorstores.