Open everettraven opened 1 month ago
Can you give a concrete example of "when concurrent reconciliation is allowed" including why it would be? It seems like we'd always want Helm's built-in support to ensure the same resources are not managed by multiple Helm Releases. If the possible concurrent manager of a resource is some operator then maybe we need to surface Helm's locks as o-c's own and document, as best practice, for operator authors to respect the locks?
It is simple to implement admission policy that can catch this situation generally during kubernetes admission, rather than relying on a client to do it (which is what happens now).
Helm's built-in support is problematic for three reasons:
We may need to increase the concurrency of our reconciler for a variety of reasons. Today reconcile blocks to populate/update the catalog cache and to pull bundle images. In the future, we may need to support helm charts that have hooks that block progression of install/upgrade/uninstall execution, which happens synchronously in the reconciler.
In order to scale to clusters with frequent ClusterExtension interactions, we will very likely need to handle ClusterExtension reconciles concurrently. As soon as we do that, Helm's guarantees disappear because we will be calling it concurrently.
I'll take over this and introduce the VAP. I'll see if I can find a way to test the race condition.
As mentioned in #736 , Helm has support for ensuring the same resources are not managed by multiple Helm Releases. This is sufficient when there is no concurrent reconciliation possible, but we will need to come up with an alternative solution that prevents race conditions when concurrent reconciliation is allowed.