HuggingFaceInferenceAPIEmbeddings getting 413 request code because of not batching mechanism like SentenceTransformer

Checked other resources

[X] I added a very descriptive title to this issue.
[X] I searched the LangChain documentation with the integrated search.
[X] I used the GitHub search to find a similar question and didn't find it.

Example Code

embeddings = HuggingFaceInferenceAPIEmbeddings(
    api_key=inference_api_key, 
    api_url=api_url,
    model_name="bge-large-en-v1.5"
)
pinecone.init(api_key=os.getenv("PINECONE_API_KEY"), environment=environment)

loader = PyPDFDirectoryLoader("data")
docs = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=200)
chunks = text_splitter.split_documents(docs)

vectordb = Pinecone.from_documents(chunks, embeddings, index_name=index_name, namespace=namespace)

this code snippet is getting 314 request code from huggingface.py

response = requests.post(
            self._api_url,
            headers=self._headers,
            json={
                "inputs": texts,
                "options": {"wait_for_model": True, "use_cache": True},
            },
        )
        return response.json()

we should support batch size here. like local model embedding

Description

I am trying to use pinecone with hugging face inference for the embedding model. My total chunks are 420. and it is trying to process in one request. Also embedding_chunk_size is not parsable from Pinecone.from_documents() method

System Info

langchain-cli==0.0.20
langchain-community==0.0.14
langchain-core==0.1.14
langchainhub==0.1.14

Related Components

[ ] LLMs/Chat Models
[X] Embedding Models
[ ] Prompts / Prompt Templates / Prompt Selectors
[ ] Output Parsers
[ ] Document Loaders
[X] Vector Stores / Retrievers
[ ] Memory
[ ] Agents / Agent Executors
[ ] Tools / Toolkits
[ ] Chains
[ ] Callbacks/Tracing
[ ] Async

🤖

Thank you for bringing this to our attention. From the information provided, it seems that the HuggingFaceInferenceAPIEmbeddings class does not currently support batch processing. The inputs parameter in the requests.post method is expected to be a list of texts that need to be embedded, and it appears that all of these texts are being processed in a single request, which could be causing the 413 request code error you're encountering.

As for the embedding_chunk_size not being parsable from the Pinecone.from_documents() method, this could be due to the fact that this method does not currently support this parameter.

Unfortunately, without the ability to create pull requests or issues, I'm unable to directly address these problems in the codebase. However, I would recommend considering the following potential solutions:

Batch Processing: You could modify the HuggingFaceInferenceAPIEmbeddings class to support batch processing. This would involve splitting the list of texts into smaller batches and making separate requests for each batch. This could potentially resolve the 413 request code error.
Embedding Chunk Size: You could also consider modifying the Pinecone.from_documents() method to support the embedding_chunk_size parameter. This would allow you to control the size of the chunks that are being embedded, which could help to prevent the 413 request code error.

Please note that these are just potential solutions and may not fully resolve the issues you're encountering. If you decide to implement these changes, please test them thoroughly to ensure they work as expected.

For reference, here is the relevant code in the HuggingFaceInferenceAPIEmbeddings class:

response = requests.post(
    self._api_url,
    headers=self._headers,
    json={
        "inputs": texts,
        "options": {"wait_for_model": True, "use_cache": True},
    },
)

And here is the relevant code in the Pinecone.from_documents() method:

vectordb = Pinecone.from_documents(chunks, embeddings, index_name=index_name, namespace=namespace)

I hope this information is helpful. If you have any further questions or need additional clarification, please don't hesitate to ask.

Sources

libs/community/langchain_community/embeddings/huggingface.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

langchain-ai / langchain