kube-object-storage / lib-bucket-provisioner

Library for the dynamic provisioning of object store buckets to be used by object store providers.
Apache License 2.0
20 stars 22 forks source link

Allow provisioning to be idempotent #200

Closed BlaineEXE closed 3 years ago

BlaineEXE commented 3 years ago

If Provisioner implementers fail to include the objectBucketName parameter as part of the validated ObjectBucketClaim CRD, this library will not be able to determine if a claim is difinitively bound. In this case, the library will continue to try to provision claims that have already been provisioned. This was the case in Rook with the bug below.

https://github.com/rook/rook/issues/6650

If an implementing Provisioner fails to include "objectBucketName" in the CRDs needed for object bucket claims, Kubernetes may drop the unknown "objectBucketName" parameter that this library uses to determine if an OBC is bound to an OB. Allow this library to recover from this case or other degraded cases that are similar by checking whether the Object Bucket that would be created for the claim already exists.

If the OB exists during provisioning and has an owner reference to the OBC, then do not continue provisioning. Instead, attempt to update the OBC to refer to the OB via "objectBucketName", and update the phase to "Bound".

Another unlikely but possible case where this degraded case might be encountered if the Provisioner operator crashes after updating the OB but before the OBC can be updated. The changes here will also help recover from this edge case.

Also:

When provisioning an OBC, the library can fail on creation of secret or configmap in the case where the operator crashes after secret and/or configmap are created but before provisioning is finished. Allow the provision code to find existing secrets and/or configmaps and continue with provisioning if they exist rather than erroring out to infinitely fail provisioning in a loop.

Signed-off-by: Blaine Gardner blaine.gardner@redhat.com

@thotz @leseb @travisn @guymguym @jeffvance @copejon

BlaineEXE commented 3 years ago

I think I just realized that this will help with the case where bucketName is given but will effectively remove a bucket that a user might have already been using if generateBucketName is set. I need to do more work on this.

BlaineEXE commented 3 years ago

@thotz @leseb @travisn @guymguym @jeffvance @copejon please have a look now

BlaineEXE commented 3 years ago

Update: I need to make another adjustment to handle the case for upgrading old claims with generateBucketName set. Older versions of the library set bucketName once a bucket was created for the claim, which is an error condition if provisioning runs again.

BlaineEXE commented 3 years ago

I have tested upgrade scenarios with Rook from https://github.com/rook/rook/pull/6788 for both cases below with success:

I also did some manual testing where I injected os.Exit calls into the code and made sure that the changes to Secret and ConfigMap creation to make them idempotent worked as expected. During these tests I found that deleting the resource was not enough, and I had to delete the resource and wait for the resource to be deleted with a Get call.

BlaineEXE commented 3 years ago

For reference, I tested idempotency of this PR with the following commit while creating new OBCs: https://github.com/BlaineEXE/lib-bucket-provisioner/commit/e9462e2ad742d8a909cf1aabbfa308cd6e5b642d

I tested upgrades from Rook v1.5.1 (before #198) to this PR for the following cases:

During upgrades, verify that:

jeffvance commented 3 years ago

Looks good @BlaineEXE . I just have a couple of mostly nit questions.

jeffvance commented 3 years ago

LGTM