in cloud watch data logs (on failure):
Received client error (413) from primary with message "Failed to buffer the request body: length limit exceeded
The request in particular for the above log was 2.4 mb in size. Our payload limit env var is set such that it should be able to support 200 mb. Looks like it is able to handle payloads up to 2mb so it seems like the limitation is stemming from payload size override not being recognized rather than token lengths.
data = ["My name is Clara and I am" * 1000, "My name is Clara and I am", "I"] * 100
inputs = {
"inputs": data,
}
bucket = '<S3_BUCKET>
key = 'TEI_embedding/input/data.json'
payload_s3_path = upload_to_s3(inputs, bucket, key)
print(f"Uploaded payload to: {payload_s3_path}")
# Invoke the asynchronous endpoint
endpoint_name = 'experimental-tei-endpoint'
output_location = invoke_async_endpoint(endpoint_name, payload_s3_path)
print(f"Asynchronous inference initiated. Output location: {output_location}")
Expected behavior
Payloads up to 200mb in size to be supported in line with PAYLOAD_LIMIT and existing async endpoints (which should support up to 1 gb as per sagemaker documentation)
System Info
TEI Image v1.4.0 AWS Sagemaker Deployment 1 x ml.g5.xlarge instance Asynchronous Deployment
Link to prior discussion: https://discuss.huggingface.co/t/async-tei-deployment-cannot-handle-requests-greater-than-2mb/107529/1
On deployment, we see two relevant logs: in logs on inital deployment:
in cloud watch data logs (on failure):
Received client error (413) from primary with message "Failed to buffer the request body: length limit exceeded
The request in particular for the above log was 2.4 mb in size. Our payload limit env var is set such that it should be able to support 200 mb. Looks like it is able to handle payloads up to 2mb so it seems like the limitation is stemming from payload size override not being recognized rather than token lengths.
Information
Tasks
Reproduction
Expected behavior
Payloads up to 200mb in size to be supported in line with PAYLOAD_LIMIT and existing async endpoints (which should support up to 1 gb as per sagemaker documentation)