Closed ConnorJC3 closed 2 months ago
File | Old Coverage | New Coverage | Delta |
---|---|---|---|
github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/cloud/cloud.go | 84.7% | 85.4% | 0.7 |
github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/expiringcache/expiring_cache.go | Does not exist | 100.0% |
/retest
/retest
/lgtm /approve
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: AndrewSirenko
The full list of commands accepted by this bot can be found here.
The pull request process is described here
Is this a bug fix or adding new feature?
Fixes #1951
What is this PR about? / Why do we need it?
If
CreateVolume
fails for any reason (examples: the user provides an invalid KMS key, due to an EBS-side issue, etc), we retry with the same client token. However, despite the fact that theCreateVolume
call corresponds to a volume that was never created, our client token is burned.In the scenario where we detect this has occurred, this PR tries again with a different token. However, to prevent volume leaks, the token must follow a predictable pattern. To do this, I append
-2
to the volume id pre-hashing for the token, and if that request fails, I instead append-3
,-4
, etc. This is done strictly in order, so a CSI driver that crashes (or restarts for any other reason, such as an upgrade) can consistently reuse the same tokens until it reaches the 'correct' token.To keep track of which token we have most recently used during runtime, I use an expiring cache, similar to the existing "likely bad names" cache implementation. In order to DRY up the code, the first two commits of this PR migrate that implementation to the
util
package (commit 1) and then migrate the existinglikelyBadNames
implementation to use theutil
version (commit 2). Finally, this PR implements the above described change using this cache (commit 3).What testing is done?
Added/updated unit tests, CI, manual