inference configurations are invalid forBedrockEmbeddings models

HannaHUp commented 11 months ago

System Info

in sagemaker. langchain==0.0.256 or 0.0.249 (I tried both) Image: Data Science 3.0 Kernel: Python 3 Instance type: ml.t3.medium 2 vCPU + 4 GiB

Who can help?

No response

Information

[X] The official example notebooks/scripts
[X] My own modified scripts

Related Components

[ ] LLMs/Chat Models
[X] Embedding Models
[ ] Prompts / Prompt Templates / Prompt Selectors
[ ] Output Parsers
[ ] Document Loaders
[ ] Vector Stores / Retrievers
[ ] Memory
[ ] Agents / Agent Executors
[ ] Tools / Toolkits
[ ] Chains
[ ] Callbacks/Tracing
[ ] Async

Reproduction

https://github.com/aws-samples/amazon-bedrock-workshop/blob/main/03_QuestionAnswering/01_qa_w_rag_claude.ipynb im trying to following this notebook:

i increase the data input size: from urllib.request import urlretrieve

os.makedirs("data", exist_ok=True)
files = [
    "https://www.irs.gov/pub/irs-pdf/p1544.pdf",
    "https://www.irs.gov/pub/irs-pdf/p15.pdf",
    "https://www.irs.gov/pub/irs-pdf/p1212.pdf",
    "https://www.irs.gov/pub/irs-pdf/p3.pdf",
    "https://www.irs.gov/pub/irs-pdf/p17.pdf",
    "https://www.irs.gov/pub/irs-pdf/p51.pdf",
    "https://www.irs.gov/pub/irs-pdf/p54.pdf",
]
for url in files:
    file_path = os.path.join("data", url.rpartition("/")[2])
    urlretrieve(url, file_path)

my data input: Average length among 1012 documents loaded is 2320 characters. After the split we have 1167 documents more than the original 1012. Average length among 1167 documents (after split) is 2011 characters.

import numpy as np
from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader, PyPDFDirectoryLoader

loader = PyPDFDirectoryLoader("./data/")

documents = loader.load()
# - in our testing Character split works better with this PDF data set
text_splitter = RecursiveCharacterTextSplitter(
    # Set a really small chunk size, just to show.
    chunk_size = 1000,
    chunk_overlap  = 100,
)
docs = text_splitter.split_documents(documents)

avg_doc_length = lambda documents: sum([len(doc.page_content) for doc in documents])//len(documents)
avg_char_count_pre = avg_doc_length(documents)
avg_char_count_post = avg_doc_length(docs)
print(f'Average length among {len(documents)} documents loaded is {avg_char_count_pre} characters.')
print(f'After the split we have {len(docs)} documents more than the original {len(documents)}.')
print(f'Average length among {len(docs)} documents (after split) is {avg_char_count_post} characters.')

from langchain.chains.question_answering import load_qa_chain
from langchain.vectorstores import FAISS
from langchain.indexes import VectorstoreIndexCreator
from langchain.indexes.vectorstore import VectorStoreIndexWrapper

vectorstore_faiss = FAISS.from_documents(
    docs,
    bedrock_embeddings,
)

wrapper_store_faiss = VectorStoreIndexWrapper(vectorstore=vectorstore_faiss)

funny thing is if my doc is smaller(docs[:5]), it worked. vectorstore_faiss = FAISS.from_documents( docs[:5], bedrock_embeddings, )

error:

ValidationException Traceback (most recent call last) File /opt/conda/lib/python3.10/site-packages/langchain/embeddings/bedrock.py:120, in BedrockEmbeddings._embedding_func(self, text) 119 try: --> 120 response = self.client.invoke_model( 121 body=body, 122 modelId=self.model_id, 123 accept="application/json", 124 contentType="application/json", 125 ) 126 response_body = json.loads(response.get("body").read())

File /opt/conda/lib/python3.10/site-packages/botocore/client.py:535, in ClientCreator._create_api_method.._api_call(self, *args, **kwargs) 534 # The "self" in this scope is referring to the BaseClient. --> 535 return self._make_api_call(operation_name, kwargs)

File /opt/conda/lib/python3.10/site-packages/botocore/client.py:980, in BaseClient._make_api_call(self, operation_name, api_params) 979 error_class = self.exceptions.from_code(error_code) --> 980 raise error_class(parsed_response, operation_name) 981 else:

ValidationException: An error occurred (ValidationException) when calling the InvokeModel operation: The provided inference configurations are invalid

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last) Cell In[35], line 10 4 from langchain.indexes.vectorstore import VectorStoreIndexWrapper 6 7 8 # 9 # ---> 10 vectorstore_faiss = FAISS.from_documents( 11 docs, 12 bedrock_embeddings, 13 ) 15 wrapper_store_faiss = VectorStoreIndexWrapper(vectorstore=vectorstore_faiss)

File /opt/conda/lib/python3.10/site-packages/langchain/vectorstores/base.py:420, in VectorStore.from_documents(cls, documents, embedding, kwargs) 418 texts = [d.page_content for d in documents] 419 metadatas = [d.metadata for d in documents] --> 420 return cls.from_texts(texts, embedding, metadatas=metadatas, kwargs)

File /opt/conda/lib/python3.10/site-packages/langchain/vectorstores/faiss.py:607, in FAISS.from_texts(cls, texts, embedding, metadatas, ids, kwargs) 581 @classmethod 582 def from_texts( 583 cls, (...) 588 kwargs: Any, 589 ) -> FAISS: 590 """Construct FAISS wrapper from raw documents. 591 592 This is a user friendly interface that: (...) 605 faiss = FAISS.from_texts(texts, embeddings) 606 """ --> 607 embeddings = embedding.embed_documents(texts) 608 return cls.__from( 609 texts, 610 embeddings, (...) 614 **kwargs, 615 )

File /opt/conda/lib/python3.10/site-packages/langchain/embeddings/bedrock.py:148, in BedrockEmbeddings.embed_documents(self, texts, chunk_size) 146 results = [] 147 for text in texts: --> 148 response = self._embedding_func(text) 149 results.append(response) 150 return results

File /opt/conda/lib/python3.10/site-packages/langchain/embeddings/bedrock.py:129, in BedrockEmbeddings._embedding_func(self, text) 127 return response_body.get("embedding") 128 except Exception as e: --> 129 raise ValueError(f"Error raised by inference endpoint: {e}")

ValueError: Error raised by inference endpoint: An error occurred (ValidationException) when calling the InvokeModel operation: The provided inference configurations are invalid

Expected behavior

I would like to generate embeddings for the entire corpus and stored in a vector store.

HannaHUp commented 11 months ago

I'm gussing my input is too big. Need to reset the default inference configurations

dosubot[bot] commented 11 months ago

Answer generated by a 🤖

Hello,

From the error message you provided, it seems like the issue is related to the size of the input data you're trying to process. The error ValidationException: An error occurred (ValidationException) when calling the InvokeModel operation: The provided inference configurations are invalid typically occurs when the input data exceeds the maximum limit that the model can handle.

In the current implementation of LangChain, the maximum batch size for embedding requests is set to 256. This is indicated by the constant MAX_BATCH_SIZE in the code. If your input data exceeds this limit, you might encounter issues.

To handle larger inputs, you can modify the inference configurations in LangChain by breaking the inputs into smaller chunks and sending multiple requests. Here's an example of how you could modify the _embed method to handle larger inputs:

def _embed(self, input: List[str]) -> List[List[float]]:
    _model_kwargs = self.model_kwargs or {}
    headers = {
        "Authorization": f"bearer {self.deepinfra_api_token}",
        "Content-Type": "application/json",
    }
    embeddings = []
    for i in range(0, len(input), 100):
        chunk = input[i:i+100]
        try:
            res = requests.post(
                f"https://api.deepinfra.com/v1/inference/{self.model_id}",
                headers=headers,
                json={"inputs": chunk, "normalize": self.normalize, **_model_kwargs},
            )
        except requests.exceptions.RequestException as e:
            raise ValueError(f"Error raised by inference endpoint: {e}")

        if res.status_code != 200:
            raise ValueError(
                "Error raised by inference API HTTP code: %s, %s"
                % (res.status_code, res.text)
            )
        try:
            t = res.json()
            chunk_embeddings = t["embeddings"]
            embeddings.extend(chunk_embeddings)
        except requests.exceptions.JSONDecodeError as e:
            raise ValueError(
                f"Error raised by inference API: {e}.\nResponse: {res.text}"
            )

    return embeddings

This modification will break the input into chunks of size 100 and send a separate request for each chunk. The embeddings for each chunk are then added to the embeddings list, which is returned at the end. You can adjust the chunk size as needed to handle your specific input size.

Please note that this is a workaround and might not be the most efficient solution. It would be best to adjust the chunk size based on the maximum input size that the BedrockEmbeddings model can handle. Unfortunately, I couldn't find specific information about the maximum input size for the BedrockEmbeddings model in the LangChain repository. You might want to consider reaching out to the maintainers of the BedrockEmbeddings model for more information.

I hope this helps! Let me know if you have any other questions.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

HannaHUp commented 11 months ago

@dosu-bot The issue_text len is 994 I got """ValueError: Error raised by inference endpoint: An error occurred (ValidationException) when calling the InvokeModel operation: The provided inference configurations are invalid"""

I have tried other text that its len is 1418. I got no issue though.

HannaHUp commented 11 months ago

@dosu-bot In the notebook from aws. It said """ Also the embeddings model has a limit of the length of input tokens limited to 512 tokens, which roughly translates to ~2000 characters. """ So I believe my chunk_size = 1000 is acceptable.

HannaHUp commented 11 months ago

@dosu-bot I don't think I'm using DeepInfraEmbeddings though. The error is from opt/conda/lib/python3.10/site-packages/langchain/embeddings/bedrock.py

dosubot[bot] commented 11 months ago

🤖

Hello,

Thank you for reaching out and providing a detailed description of your issue. It seems like you're encountering a ValidationException error due to the size of the input data when calling the InvokeModel operation with the BedrockEmbeddings model.

Based on the context provided, it appears that the issue might be related to the number of input texts that are being sent to the model in a single request. In the LangChain framework, you can adjust the chunk_size parameter in the embed_documents method of the SagemakerEndpointEmbeddings class to control this.

Here is how you can adjust the chunk_size:

from langchain.embeddings import SagemakerEndpointEmbeddings

# Initialize the SagemakerEndpointEmbeddings instance
se = SagemakerEndpointEmbeddings(
    endpoint_name="my-endpoint-name",
    region_name="us-west-2",
    credentials_profile_name="default"
)

# List of texts to embed
texts = ["text1", "text2", "text3", ...]

# Adjust the chunk_size to a smaller value if the input size is large
chunk_size = 32  # Adjust this value based on your needs

# Compute embeddings
embeddings = se.embed_documents(texts, chunk_size)

In the above code, chunk_size is set to 32. You can adjust this value based on your needs and the limitations of your model and infrastructure. The smaller the chunk_size, the fewer texts will be sent to the model in a single request, which can help avoid the ValidationException error if the input size is too large.

As for the maximum input size that the InvokeModel operation can handle in the LangChain framework, I wasn't able to find a specific answer within the repository. It might depend on various factors such as the specific model you're using, the infrastructure, and the AWS SageMaker limitations.

Please try adjusting the chunk_size and let me know if this resolves your issue. If not, please provide more details about the error and I'll be glad to assist you further.

Best, Dosu

Sources

libs/langchain/langchain/embeddings/sagemaker_endpoint.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

VadneyK commented 11 months ago

I was having the same issue when I changed the model to amazon.titan-tg1-large instead of the default anthropic.claude-v1.

My solution was to change

llm = Bedrock(model_id="amazon.titan-tg1-large", client=boto3_bedrock, model_kwargs={'max_tokens_to_sample':200})

to

llm = Bedrock(model_id="amazon.titan-tg1-large", client=boto3_bedrock)

I don't see what model you are using so I don't know if this is your issue. It seems the error we were given is a non-descriptive one that is given to any validation error for a model input so It's hard to pinpoint what exactly is wrong.

remember the InvokeModel API input is different per model as shown in the bedrock_boto3_setup notebook

Amazon Titan Large

Input

{   
    "inputText": "<prompt>",
    "textGenerationConfig" : { 
        "maxTokenCount": 512,
        "stopSequences": [],
        "temperature": 0.1,  
        "topP": 0.9
    }
}

Output

{
    "inputTextTokenCount": 613,
    "results": [{
        "tokenCount": 219,
        "outputText": "<output>"
    }]
}

Kevin-McIsaac commented 11 months ago

I have the same problem. The issue appears to be the number of tokens (or lenght of the text)

from langchain.embeddings import BedrockEmbeddings
BedrockEmbeddings(region_name="us-east-1").embed_query('word '*509)

works

from langchain.embeddings import BedrockEmbeddings
BedrockEmbeddings(region_name="us-east-1").embed_query('word '*510)

gives a validation error

ValueError: Error raised by inference endpoint: An error occurred (ValidationException) when calling the InvokeModel operation: The provided inference configurations are invalid

HannaHUp commented 11 months ago

I have the same problem. The issue appears to be the number of tokens (or lenght of the text)

from langchain.embeddings import BedrockEmbeddings
BedrockEmbeddings(region_name="us-east-1").embed_query('word '*509)

works

from langchain.embeddings import BedrockEmbeddings
BedrockEmbeddings(region_name="us-east-1").embed_query('word '*510)

gives a validation error

ValueError: Error raised by inference endpoint: An error occurred (ValidationException) when calling the InvokeModel operation: The provided inference configurations are invalid

Yes it is embedding model issue. I keep set the chunk size smaller and smller in order to get the embedding model works. But it doesn't make sense. Since the length of input tokens limited to 512 tokens, which roughly translates to ~2000 characters, I assume that all my input should not give me error because the max was 1000 characters and it is well below 2000 characters.

Kevin-McIsaac commented 10 months ago

Another way to sovle the chunking problem is to use from_tiktoken_encoder to count by tokens rather than characters.e.g,

from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

tiktoken_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=500, chunk_overlap=0
)
pdf_docs = [doc for pdf in glob.glob('data/*.pdf') for doc in PyPDFLoader(pdf).load_and_split(tiktoken_splitter)  ]

dosubot[bot] commented 7 months ago

Hi, @HannaHUp,

I'm helping the LangChain team manage their backlog and am marking this issue as stale. It seems like you encountered a "ValidationException" error when trying to generate embeddings for a large corpus of documents using Amazon SageMaker with the langchain library. Suggestions were made to adjust the chunk size and modify the inference configurations to handle larger inputs. However, the current status of the issue is unresolved.

Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days. Thank you!

langchain-ai / langchain