kube-object-storage / lib-bucket-provisioner

Library for the dynamic provisioning of object store buckets to be used by object store providers.
Apache License 2.0
20 stars 22 forks source link

Library is not thread safe #199

Closed BlaineEXE closed 3 years ago

BlaineEXE commented 3 years ago

Despite having the option to have multiple threads, I do not believe this library is thread safe. When creating an OBC with the bucketName defined, there is risk of the bucket being created and then deleted immediately. When the OBC's status is updated by the provisioner to "Pending" state the first time (in thread 1), this is counted as a change that results in the OBC being added to the work queue. Another thread (thread 2) will pick the OBC currently being created off the queue and then begin work on it at the same time. Thread 1 will likely create the secret and configmap after thread 2 gets the OBC and finds that it does not have the objectBucketName set, thus starting a second provision. Thread 2 will set the phase to "Pending" again. Thread 1 can complete provisioning after this and set the OBC status to "Bound" and Thread 1 will now be done. Thread 2 will fail to create the secret for the OBC because Thread 1 already created it, will error out, and then Thread 2 will clean up after itself, deleting the bucket bucketName. In this case, I believe the OBC will still have its phase as "Bound" as set by thread 1, and the bucket will have been deleted by cleanup in thread 2.

guymguym commented 3 years ago

Hey @BlaineEXE - For the same OBC the WorkQueue will synchronize the work. Do you see a case for how two workers can get the same item from the queue? Because afaik this was the intention of the queue using a namespace+name to identify the reconcile requests queue per each CR.

guymguym commented 3 years ago

See WorkQueue Add here - https://github.com/kubernetes/client-go/blob/fb61a7c88cb9f599363919a34b7c54a605455ffc/util/workqueue/queue.go#L107-L127

BlaineEXE commented 3 years ago

I verified yesterday that the item is re-queued once the phase is updated from "" to "Pending". I wouldn't consider this a duplicate item in the queue because the item has an updated resource version. I didn't try this in multithreaded mode to verify the race condition, however. I don't see any protections against a second worker taking a different queue item for a provision key that is already in progress on another worker. I hope this protection is there, but I have concerns and want to make sure there is no risk here.

guymguym commented 3 years ago

And here you see how the library controller just inserts a structure with name+namesapce into the queue, and not the OBC itself: https://github.com/kube-object-storage/lib-bucket-provisioner/blob/e56b2dc69af668e767a973bf745713c89b921485/pkg/provisioner/controller.go#L139-L147

guymguym commented 3 years ago

I verified yesterday that the item is re-queued once the phase is updated from "" to "Pending". I wouldn't consider this a duplicate item in the queue because the item has an updated resource version. I didn't try this in multithreaded mode to verify the race condition, however. I don't see any protections against a second worker taking a different queue item for a provision key that is already in progress on another worker. I hope this protection is there, but I have concerns and want to make sure there is no risk here.

But the items in the queue are strings...

guymguym commented 3 years ago

See cache.MetaNamespaceKeyFunc here: https://github.com/kubernetes/client-go/blob/fb61a7c88cb9f599363919a34b7c54a605455ffc/tools/cache/store.go#L92-L111

It's just namespace/name ...

BlaineEXE commented 3 years ago

So the implementation of the queue makes it thread safe?

BlaineEXE commented 3 years ago

I do see the queue has the following features from its doc:

* Fair: items processed in the order in which they are added.

* Stingy: a single item will not be processed multiple times concurrently,

    and if an item is added multiple times before it can be processed, it

    will only be processed once.

* Multiple consumers and producers. In particular, it is allowed for an

    item to be reenqueued while it is being processed.

* Shutdown notifications.

Being a stingy queue, it seems my concerns are unwarranted... I'd still like to run a test to make sure it is working as expected. I've been finding many small issues with the library's provisioning implementation recently and want to feel certain of it.

BlaineEXE commented 3 years ago

We did coincidentally see this issue in Rook https://github.com/rook/rook/issues/6751.

But after testing today and looking at Rook's logs, I believe the queue does make this library thread safe. I think the issue above may be related to things I fixed a few days ago. Thanks, @guymguym for your patience.