Closed HannaHUp closed 7 months ago
I'm gussing my input is too big. Need to reset the default inference configurations
Answer generated by a 🤖
Hello,
From the error message you provided, it seems like the issue is related to the size of the input data you're trying to process. The error ValidationException: An error occurred (ValidationException) when calling the InvokeModel operation: The provided inference configurations are invalid
typically occurs when the input data exceeds the maximum limit that the model can handle.
In the current implementation of LangChain, the maximum batch size for embedding requests is set to 256. This is indicated by the constant MAX_BATCH_SIZE
in the code. If your input data exceeds this limit, you might encounter issues.
To handle larger inputs, you can modify the inference configurations in LangChain by breaking the inputs into smaller chunks and sending multiple requests. Here's an example of how you could modify the _embed
method to handle larger inputs:
def _embed(self, input: List[str]) -> List[List[float]]:
_model_kwargs = self.model_kwargs or {}
headers = {
"Authorization": f"bearer {self.deepinfra_api_token}",
"Content-Type": "application/json",
}
embeddings = []
for i in range(0, len(input), 100):
chunk = input[i:i+100]
try:
res = requests.post(
f"https://api.deepinfra.com/v1/inference/{self.model_id}",
headers=headers,
json={"inputs": chunk, "normalize": self.normalize, **_model_kwargs},
)
except requests.exceptions.RequestException as e:
raise ValueError(f"Error raised by inference endpoint: {e}")
if res.status_code != 200:
raise ValueError(
"Error raised by inference API HTTP code: %s, %s"
% (res.status_code, res.text)
)
try:
t = res.json()
chunk_embeddings = t["embeddings"]
embeddings.extend(chunk_embeddings)
except requests.exceptions.JSONDecodeError as e:
raise ValueError(
f"Error raised by inference API: {e}.\nResponse: {res.text}"
)
return embeddings
This modification will break the input into chunks of size 100 and send a separate request for each chunk. The embeddings for each chunk are then added to the embeddings
list, which is returned at the end. You can adjust the chunk size as needed to handle your specific input size.
Please note that this is a workaround and might not be the most efficient solution. It would be best to adjust the chunk size based on the maximum input size that the BedrockEmbeddings model can handle. Unfortunately, I couldn't find specific information about the maximum input size for the BedrockEmbeddings model in the LangChain repository. You might want to consider reaching out to the maintainers of the BedrockEmbeddings model for more information.
I hope this helps! Let me know if you have any other questions.
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
@dosu-bot
The issue_text len is 994
I got """ValueError: Error raised by inference endpoint: An error occurred (ValidationException) when calling the InvokeModel operation: The provided inference configurations are invalid"""
I have tried other text that its len is 1418. I got no issue though.
@dosu-bot In the notebook from aws. It said """ Also the embeddings model has a limit of the length of input tokens limited to 512 tokens, which roughly translates to ~2000 characters. """ So I believe my chunk_size = 1000 is acceptable.
@dosu-bot I don't think I'm using DeepInfraEmbeddings though. The error is from opt/conda/lib/python3.10/site-packages/langchain/embeddings/bedrock.py
🤖
Hello,
Thank you for reaching out and providing a detailed description of your issue. It seems like you're encountering a ValidationException
error due to the size of the input data when calling the InvokeModel
operation with the BedrockEmbeddings model.
Based on the context provided, it appears that the issue might be related to the number of input texts that are being sent to the model in a single request. In the LangChain framework, you can adjust the chunk_size
parameter in the embed_documents
method of the SagemakerEndpointEmbeddings
class to control this.
Here is how you can adjust the chunk_size
:
from langchain.embeddings import SagemakerEndpointEmbeddings
# Initialize the SagemakerEndpointEmbeddings instance
se = SagemakerEndpointEmbeddings(
endpoint_name="my-endpoint-name",
region_name="us-west-2",
credentials_profile_name="default"
)
# List of texts to embed
texts = ["text1", "text2", "text3", ...]
# Adjust the chunk_size to a smaller value if the input size is large
chunk_size = 32 # Adjust this value based on your needs
# Compute embeddings
embeddings = se.embed_documents(texts, chunk_size)
In the above code, chunk_size
is set to 32. You can adjust this value based on your needs and the limitations of your model and infrastructure. The smaller the chunk_size
, the fewer texts will be sent to the model in a single request, which can help avoid the ValidationException
error if the input size is too large.
As for the maximum input size that the InvokeModel
operation can handle in the LangChain framework, I wasn't able to find a specific answer within the repository. It might depend on various factors such as the specific model you're using, the infrastructure, and the AWS SageMaker limitations.
Please try adjusting the chunk_size
and let me know if this resolves your issue. If not, please provide more details about the error and I'll be glad to assist you further.
Best, Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
I was having the same issue when I changed the model to amazon.titan-tg1-large instead of the default anthropic.claude-v1.
My solution was to change
llm = Bedrock(model_id="amazon.titan-tg1-large", client=boto3_bedrock, model_kwargs={'max_tokens_to_sample':200})
to
llm = Bedrock(model_id="amazon.titan-tg1-large", client=boto3_bedrock)
I don't see what model you are using so I don't know if this is your issue. It seems the error we were given is a non-descriptive one that is given to any validation error for a model input so It's hard to pinpoint what exactly is wrong.
remember the InvokeModel API input is different per model as shown in the bedrock_boto3_setup notebook
{
"inputText": "<prompt>",
"textGenerationConfig" : {
"maxTokenCount": 512,
"stopSequences": [],
"temperature": 0.1,
"topP": 0.9
}
}
{
"inputTextTokenCount": 613,
"results": [{
"tokenCount": 219,
"outputText": "<output>"
}]
}
I have the same problem. The issue appears to be the number of tokens (or lenght of the text)
from langchain.embeddings import BedrockEmbeddings
BedrockEmbeddings(region_name="us-east-1").embed_query('word '*509)
works
from langchain.embeddings import BedrockEmbeddings
BedrockEmbeddings(region_name="us-east-1").embed_query('word '*510)
gives a validation error
ValueError: Error raised by inference endpoint: An error occurred (ValidationException) when calling the InvokeModel operation: The provided inference configurations are invalid
I have the same problem. The issue appears to be the number of tokens (or lenght of the text)
from langchain.embeddings import BedrockEmbeddings BedrockEmbeddings(region_name="us-east-1").embed_query('word '*509)
works
from langchain.embeddings import BedrockEmbeddings BedrockEmbeddings(region_name="us-east-1").embed_query('word '*510)
gives a validation error
ValueError: Error raised by inference endpoint: An error occurred (ValidationException) when calling the InvokeModel operation: The provided inference configurations are invalid
Yes it is embedding model issue. I keep set the chunk size smaller and smller in order to get the embedding model works. But it doesn't make sense. Since the length of input tokens limited to 512 tokens, which roughly translates to ~2000 characters, I assume that all my input should not give me error because the max was 1000 characters and it is well below 2000 characters.
Another way to sovle the chunking problem is to use from_tiktoken_encoder
to count by tokens rather than characters.e.g,
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
tiktoken_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
chunk_size=500, chunk_overlap=0
)
pdf_docs = [doc for pdf in glob.glob('data/*.pdf') for doc in PyPDFLoader(pdf).load_and_split(tiktoken_splitter) ]
Hi, @HannaHUp,
I'm helping the LangChain team manage their backlog and am marking this issue as stale. It seems like you encountered a "ValidationException" error when trying to generate embeddings for a large corpus of documents using Amazon SageMaker with the langchain library. Suggestions were made to adjust the chunk size and modify the inference configurations to handle larger inputs. However, the current status of the issue is unresolved.
Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days. Thank you!
System Info
in sagemaker. langchain==0.0.256 or 0.0.249 (I tried both) Image: Data Science 3.0 Kernel: Python 3 Instance type: ml.t3.medium 2 vCPU + 4 GiB
Who can help?
No response
Information
Related Components
Reproduction
https://github.com/aws-samples/amazon-bedrock-workshop/blob/main/03_QuestionAnswering/01_qa_w_rag_claude.ipynb im trying to following this notebook:
i increase the data input size: from urllib.request import urlretrieve
my data input: Average length among 1012 documents loaded is 2320 characters. After the split we have 1167 documents more than the original 1012. Average length among 1167 documents (after split) is 2011 characters.
funny thing is if my doc is smaller(docs[:5]), it worked. vectorstore_faiss = FAISS.from_documents( docs[:5], bedrock_embeddings, )
error:
ValidationException Traceback (most recent call last) File /opt/conda/lib/python3.10/site-packages/langchain/embeddings/bedrock.py:120, in BedrockEmbeddings._embedding_func(self, text) 119 try: --> 120 response = self.client.invoke_model( 121 body=body, 122 modelId=self.model_id, 123 accept="application/json", 124 contentType="application/json", 125 ) 126 response_body = json.loads(response.get("body").read())
File /opt/conda/lib/python3.10/site-packages/botocore/client.py:535, in ClientCreator._create_api_method.._api_call(self, *args, **kwargs)
534 # The "self" in this scope is referring to the BaseClient.
--> 535 return self._make_api_call(operation_name, kwargs)
File /opt/conda/lib/python3.10/site-packages/botocore/client.py:980, in BaseClient._make_api_call(self, operation_name, api_params) 979 error_class = self.exceptions.from_code(error_code) --> 980 raise error_class(parsed_response, operation_name) 981 else:
ValidationException: An error occurred (ValidationException) when calling the InvokeModel operation: The provided inference configurations are invalid
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last) Cell In[35], line 10 4 from langchain.indexes.vectorstore import VectorStoreIndexWrapper 6 7 8 # 9 # ---> 10 vectorstore_faiss = FAISS.from_documents( 11 docs, 12 bedrock_embeddings, 13 ) 15 wrapper_store_faiss = VectorStoreIndexWrapper(vectorstore=vectorstore_faiss)
File /opt/conda/lib/python3.10/site-packages/langchain/vectorstores/base.py:420, in VectorStore.from_documents(cls, documents, embedding, kwargs) 418 texts = [d.page_content for d in documents] 419 metadatas = [d.metadata for d in documents] --> 420 return cls.from_texts(texts, embedding, metadatas=metadatas, kwargs)
File /opt/conda/lib/python3.10/site-packages/langchain/vectorstores/faiss.py:607, in FAISS.from_texts(cls, texts, embedding, metadatas, ids, kwargs) 581 @classmethod 582 def from_texts( 583 cls, (...) 588 kwargs: Any, 589 ) -> FAISS: 590 """Construct FAISS wrapper from raw documents. 591 592 This is a user friendly interface that: (...) 605 faiss = FAISS.from_texts(texts, embeddings) 606 """ --> 607 embeddings = embedding.embed_documents(texts) 608 return cls.__from( 609 texts, 610 embeddings, (...) 614 **kwargs, 615 )
File /opt/conda/lib/python3.10/site-packages/langchain/embeddings/bedrock.py:148, in BedrockEmbeddings.embed_documents(self, texts, chunk_size) 146 results = [] 147 for text in texts: --> 148 response = self._embedding_func(text) 149 results.append(response) 150 return results
File /opt/conda/lib/python3.10/site-packages/langchain/embeddings/bedrock.py:129, in BedrockEmbeddings._embedding_func(self, text) 127 return response_body.get("embedding") 128 except Exception as e: --> 129 raise ValueError(f"Error raised by inference endpoint: {e}")
ValueError: Error raised by inference endpoint: An error occurred (ValidationException) when calling the InvokeModel operation: The provided inference configurations are invalid
Expected behavior
I would like to generate embeddings for the entire corpus and stored in a vector store.