Open james-bjss opened 3 years ago
I got the exact same problem!
Have also raised this to AWS support who have escalated to MSK Team, to confirm if the 429 response is expected behavior.
Update on the above. AWS MSK team are reviewing the 429 response code and may remediate this, but no dates have been given.
@james-bjss any updates on this?
@james-bjss any updates on this?
Hi @marcincuber - Unfortunately I never got a response back from AWS support on this. It was passed on to the MSK team and the ticket closed. In theory it could be handled in the provider by checking for the specific header it returns, but not sure if the team would want to put the workaround in code.
Have you had this issue recently? I haven't retested so it's entirely possible it could have been resolved upstream
@james-bjss I haven't tested it. However, I will be starting work on Kafka this week. This is an interesting issue that you mentioned here so I will definitely check whether I can reproduce.
I also contacted AWS support about this issue and changing 429 to something else was added to their backlog - no ETA though.
"Thank you for providing the change request. I have added this to the backlog and it will be prioritized accordingly."
Community Note
Terraform CLI and Terraform AWS Provider Version
Affected Resource(s)
Terraform Configuration Files
Debug Output
Gist with relevant logs
Expected Behavior
The apply should fail early indicating that the upgrade can't be performed due to the High Partition count.
Actual Behavior
The PUT call to
/v1/clusters/clusterArn/version
fails with a HTTP 429X-Amzn-Errortype: HighPartitionCountException
TF output reports that it is retrying (x25).Steps to Reproduce
kafka_version="2.4.1.1"
via TFkafka_version
to 2.5.1 and apply to trigger upgradeImportant Factoids
/v1/clusters/clusterArn/version
returns a HTTP 429 (429 Too Many Requests)X-Amzn-Errortype: HighPartitionCountException
however I am not sure a 429 code is the correct code in this instance, so this could be an issue on the AWS API side.There may be an argument to say it should retry if the partition count drops, but in my opinion I would rather the apply fail early with an indication of the actual error . In theory TF is honoring the 429 response by retrying, but should it?
References
https://docs.aws.amazon.com/msk/latest/developerguide/bestpractices.html#bestpractices-right-size-cluster https://docs.aws.amazon.com/msk/1.0/apireference/clusters-clusterarn-version.html#clusters-clusterarn-versionput https://docs.aws.amazon.com/msk/latest/developerguide/limits.html