kubernetes-sigs / container-object-storage-interface-controller

Container Object Storage Interface (COSI) controller responsible to manage lifecycle of COSI objects. NOTE: The content of this repo has been moved to https://github.com/kubernetes-sigs/container-object-storage-interface.
Apache License 2.0
94 stars 28 forks source link

When created multiple bucketclaims parallelly, later associated buckets are not getting deleted along with the bucketclaim #139

Open vegullah opened 1 month ago

vegullah commented 1 month ago

Bug Report

What happened: I've created 5 bucketclaims in parallel, all the bucketclaims and associated buckets were created. When I tried to delete those bucketclaims, all the bucketclaims got deleted, but some of the buckets were not deleted

What you expected to happen: When deleted bucketclaim, associated buckets should get deleted successfully

How to reproduce this bug (as minimally and precisely as possible):

  1. Create 5-6 BucketClaims parallely(with help of any script & thread package).
  2. Delete the BucketClaim's together in a single command.
  3. Repeat step 1 and 2 few times
  4. Create 5-6 BucketClaims, with the same name as in step-1, parallely. Verify if this error is seen in the object controller pod logs.
    I1023 11:56:15.533581       1 bucketclaim.go:32] "Add BucketClaim" name="bclc" ns="default" bucketClass="bc1"
    E1023 11:56:15.545584       1 bucketclaim.go:197] "Failed to update status of BucketClaim" err="Operation cannot be fulfilled on bucketclaims.objectstorage.k8s.io \"bclc\": the object has been modified; please apply your changes to the latest version and try again" name=""
    E1023 11:56:15.545618       1 bucketclaim.go:53] "name" err="Operation cannot be fulfilled on bucketclaims.objectstorage.k8s.io \"bclc\": the object has been modified; please apply your changes to the latest version and try again" bclc="ns" default="err" Operation cannot be fulfilled on bucketclaims.objectstorage.k8s.io "bclc": the object has been modified; please apply your changes to the latest version and try again="(MISSING)"
  5. Delete the BucketClaim's together in a single command(or) one after another. -> The bucketclaims would be seen as deleted, but few buckets would be remaining. -> Delete request for the buckets remained, is not seen in side-car -> Something makes the controller delete the bucketclaim, without waiting for the bucket to be deleted. cosi-provisioner-sidecar.log objectstorage-controller.log

Anything else relevant for this bug report?:

Environment:

narayviv commented 1 month ago

Hi @BlaineEXE ,

While going through the codes of 'controller' & 'sidecar', I see that both sidecar & controller are updating status of the bucketclaim (within the scope of method BucketClaimListener#provisionBucketClaimOperation). https://github.com/kubernetes-sigs/container-object-storage-interface-provisioner-sidecar/blob/80979e8992a6a2b2166f3ff1e7d39b4ab03f045c/pkg/bucket/bucket_controller.go#L163 https://github.com/kubernetes-sigs/container-object-storage-interface-controller/blob/38b4915c1bbc6b63144fa81351a72d228184a34c/pkg/bucketclaim/bucketclaim.go#L204

In this scenario, the bucketclaim object with the method(provisionBucketClaimOperation) of controller gets outdated, once sidecar updates the bucketClaim CR's status.

A suggestion from our side is to follow sidecar's approach with controller as well.

Note: We didnt notice this issue, when bucket claims were created sequentially, with adequate time-gap. And happened to notice it when bucketclaims were created parallely (and sequentially without adequate time gap through scripts/yaml's).

Lets us know, if we could help further. CC: @vegullah

BlaineEXE commented 1 month ago

Thanks @narayviv this sounds like something we should address in the v1alpha2 API updates as well. I created a new issue and started tracking it via the COSI kanban board

BlaineEXE commented 1 month ago

@vegullah it's not clear to me from reading your description if this is a permanent or temporary issue. Could you clarify?

If the issue is temporary, this is an issue that can sometimes happen with any controller. This error is how Kubernetes helps prevent multiple readers/writers from colliding. As long as the issue resolves itself eventually, I don't see a need to fix this urgently.

As @narayviv has mentioned, this might be due to controller and sidecar both editing the resource. We will look into this and see if we can make the error reported here less frequent at a minimum. While the error doesn't seem concerning based on my assumption that it's not preventing reconciliation, we also don't want to have this happen every time, spamming the logs.

vegullah commented 1 month ago

@BlaineEXE, You don't see the issue for initial 2 to 3 tries. Let's say you created 5 bucketclaims (bcl1, bcl2, bcl3, bcl4 and bcl5). You'll be successfully able to delete the created 5 bucketclaims and associated buckets. Mostly this cycle of creation and deletion(with same bucketclaim names as previous cycle) will work for 2nd time aswell. Then when you try to do the same thing for 3rd/4th time - Create the 5 bucketclaims (bcl1, bcl2, bcl3, bcl4 and bcl5), And at this time, when you try to delete the bucketclaims, some of them will be deleted successfully and for other some, the buckets will be remain

The issue doesn't fix by itself, unless you delete controller and side-car pods