Closed bisoldi closed 5 years ago
I still think that documentation is not very clear in this particular place, but thanks for the heads up. I think I'll just try calling the API with that big record to find out what the real limit is.
Totally agree on the quality of the documentation. If it will help, I am going to put in a tech support request and ask for clarification and I'll post the answer here.
I played with it a bit. So far, with boto3==1.9.49 and botocore==1.12.49 the size of partition key is ignored. I was able to put records of exactly 1048576 bytes and got exception for anything bigger. I tried partition keys of different sizes and unless it is bigger than allowed 256 characters, no exception raised. Adding ExplicitHashKey does not change anything.
With batch put_records() it is essentially the same - max 1 MB per record and max 5 MB of all records combined no matter what are partition keys.
However, in async-kinesis-client I'm calculating the record size wrong - as this is just a plain byte or bytearray value, simple len() should be used. Going to fix this soon.
Wow, that's amazing...blows my mind, because that's not what the docs imply (as confusing as it is).
Anyways, I already submitted the ticket, so I'll be happy to share the response.
Size calculation reworked in 0.1.3. @bisoldi - still would love to know AWS's response, though. Thanks. Closing.
@whale2 Just to close the loop on this....It seems the documentation does not match reality.
My email to AWS:
I'm seeking clarification on the PutRecords API limitations, specifically the individual record size limit. Below is the relevant paragraph:
"Each PutRecords request can support up to 500 records. Each record in the request can be as large as 1 MiB, up to a limit of 5 MiB for the entire request, including partition keys. Each shard can support writes up to 1,000 records per second, up to a maximum data write total of 1 MiB per second."
My question is, does the 1MB limit per record INCLUDE the partition key and explicit hash key? Or does it include only the data blob itself?
And their response:
To answer your question, the 1MB upper limit that applies to the size of each record, includes the data blob (the payload before base64-encoding) as well as the partition key [1]. Therefore, we can look at it as follows: Size of data blob + partition key =< 1MB
References: [1] https://docs.aws.amazon.com/kinesis/latest/APIReference/API_PutRecord.html#API_PutRecord_RequestParameters
@bisoldi I think I should check deeper. Maybe it is boto3 which doesn't raise the exception while kinesis really trims or ignores the record of 1M size without partition key. Shame on me, I didn't check if I really received that messages.
Line 103 in kinesis_producer.py, you have:
I think that's incorrect. You're comparing the size of the data itself against the 1MB limit per record, when it sounds like it should be the size of the data + partition key + hashkey. You use the correct (assuming my understanding is correct) methodology on line 12 when you compare total datum size against the max request size (5MB):
Below is from the AWS Kinesis PutRecords API documentation:
And here is the sample request the documentation provides:
The way I'm reading the documentation, the "record" that can be a maximum of 1MB includes the Data, ExplicitHashKey and PartitionKey.
I think you can remove
record_size = utils._sizeof(datum.get('Data'))
and comparedatum_size
againstMAX_RECORD_SIZE
.