kubernetes-sigs / aws-ebs-csi-driver

CSI driver for Amazon EBS https://aws.amazon.com/ebs/
Apache License 2.0
974 stars 787 forks source link

Use new client token when `CreateVolume` returns `IdempotentParameterMismatch` #2075

Closed ConnorJC3 closed 2 months ago

ConnorJC3 commented 3 months ago

Is this a bug fix or adding new feature?

Fixes #1951

What is this PR about? / Why do we need it?

If CreateVolume fails for any reason (examples: the user provides an invalid KMS key, due to an EBS-side issue, etc), we retry with the same client token. However, despite the fact that the CreateVolume call corresponds to a volume that was never created, our client token is burned.

In the scenario where we detect this has occurred, this PR tries again with a different token. However, to prevent volume leaks, the token must follow a predictable pattern. To do this, I append -2 to the volume id pre-hashing for the token, and if that request fails, I instead append -3, -4, etc. This is done strictly in order, so a CSI driver that crashes (or restarts for any other reason, such as an upgrade) can consistently reuse the same tokens until it reaches the 'correct' token.

To keep track of which token we have most recently used during runtime, I use an expiring cache, similar to the existing "likely bad names" cache implementation. In order to DRY up the code, the first two commits of this PR migrate that implementation to the util package (commit 1) and then migrate the existing likelyBadNames implementation to use the util version (commit 2). Finally, this PR implements the above described change using this cache (commit 3).

What testing is done?

Added/updated unit tests, CI, manual

github-actions[bot] commented 3 months ago

Code Coverage Diff

File Old Coverage New Coverage Delta
github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/cloud/cloud.go 84.7% 85.4% 0.7
github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/expiringcache/expiring_cache.go Does not exist 100.0%
ConnorJC3 commented 3 months ago

/retest

ConnorJC3 commented 3 months ago

/retest

AndrewSirenko commented 2 months ago

/lgtm /approve

k8s-ci-robot commented 2 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: AndrewSirenko

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/OWNERS)~~ [AndrewSirenko] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment