Closed jayant-yadav closed 8 months ago
π€
Hello @jayant-yadav,
Nice to meet you! I'm here to assist you with your issues and questions related to the LangChain repository. I'm a bot created to provide immediate responses while you're waiting for a human maintainer. Please don't hesitate to ask me anything you need. Let's get started!
Based on the error message you provided, it seems that the EmbeddingFunction
interface has indeed changed. The error message suggests that the EmbeddingFunction.__call__
method is now expected to have the signature odict_keys(['self', 'input'])
, but the HuggingFaceBgeEmbeddings
class has the signature odict_keys(['self', 'args', 'kwargs'])
.
The EmbeddingFunction
is not explicitly defined in the LangChain codebase, but it is used as an interface for embedding functions. The HuggingFaceBgeEmbeddings
class, which implements this interface, uses the sentence_transformers.SentenceTransformer
to encode or embed the documents and queries.
To resolve this issue, you might need to update the HuggingFaceBgeEmbeddings
class to match the new EmbeddingFunction
interface. Specifically, you might need to change the embed_documents
and embed_query
methods to accept a single 'input' argument instead of 'args' and 'kwargs'. Here's an example of how you might do this:
def embed_documents(self, input: List[str]) -> List[List[float]]:
"""Compute doc embeddings using a HuggingFace transformer model.
Args:
input: The list of texts to embed.
Returns:
List of embeddings, one for each text.
"""
input = [t.replace("\n", " ") for t in input]
embeddings = self.client.encode(input, **self.encode_kwargs)
return embeddings.tolist()
def embed_query(self, input: str) -> List[float]:
"""Compute query embeddings using a HuggingFace transformer model.
Args:
input: The text to embed.
Returns:
Embeddings for the text.
"""
input = input.replace("\n", " ")
embedding = self.client.encode(
self.query_instruction + input, **self.encode_kwargs
)
return embedding.tolist()
Please note that this is just a suggestion and might not fully resolve the issue. You might need to make additional changes to the HuggingFaceBgeEmbeddings
class to fully comply with the new EmbeddingFunction
interface. I recommend checking the ChromaDB migration logs and the EmbeddingFunction interface documentation for more details.
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a π if this is helpful and π if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
Quick work around until the HuggingFaceEmbeddings is updated.
pip install chromadb==0.4.15
This breaking change affects chromadb 0.4.16
Langchain 0.0.332 released with the chroma team's fix for compatibility with chromadb>=0.4.16
Can now use latest of both pip install -U langchain chromadb
Actually looks like there was something specific with HuggingFaceBgeEmbeddings
as well - could you confirm 0.0.332 with the chroma fix addresses this, and reopen if it's something that needs to be addressed in hugging face?
@efriis The fix in 0.0.332 works! Now langchain's latest version (0.0.332) is compatible with chromadb==0.4.16. If possible, i would like to know where were the changes made to fix this issue?
Actually looks like there was something specific with
HuggingFaceBgeEmbeddings
as well - could you confirm 0.0.332 with the chroma fix addresses this, and reopen if it's something that needs to be addressed in hugging face?
This is fixed now .. Thank you @efriis
Langchain 0.0.332 released with the chroma team's fix for compatibility with chromadb>=0.4.16
Can now use latest of both
pip install -U langchain chromadb
thanks this fixed my error!
I am using chromadb-0.5.0 and langchain-0.2.1 and I still run into this error when I try to host ChromaDB using a docker container.
hf = HuggingFaceBgeEmbeddings(
model_name=modelPath,
model_kwargs=model_kwargs,
encode_kwargs=encode_kwargs,
cache_folder="./cache"
)
chroma_client = chromadb.HttpClient(host='localhost', port=8000)
collection = chroma_client.create_collection(name="DATA_V3",embedding_function=hf)
System Info
Using Google Colab Free version with T4 GPU. chromadb==0.4.16
Who can help?
@agola11 @hwchase17
Information
Related Components
Reproduction
As per the latest Chromadb migration logs (link)
EmbeddingFunction
defnition has been updated and it affects all the custom made embedding function.What this means is the
langchain.embeddings.HuggingFaceBgeEmbeddings
is inconsistent with this new definition and throws the following error:The above code can be reproduced by inserting documents into Chromadb embedded using
HuggingFaceBgeEmbeddings
like so:I am not sure, but the answer might lie in correcting the
HuggingFaceBgeEmbeddings
class : link ?Expected behavior
The expected behaviour would have made a valid
db
object upon running the code