red-hat-storage / ocs-operator

Operator for RHOCS
Apache License 2.0
86 stars 184 forks source link

Use TSCs in-place of Pod affinity/anti-affinity & always enforce them even if custom placement is specified #2720

Closed malayparida2000 closed 3 weeks ago

malayparida2000 commented 1 month ago

Currently TSCs are enforced only if there is no other placement spec defined. Without the TSCs, the pods might not be distributed evenly. So we should always enforce the TSCs even if some custom placement is specified.

This also refactors the code to use TSCs in-place of Pod affinity/anti-affinity for ceph daemons like mgr, mon, mds, rgw, nfs.

openshift-ci[bot] commented 1 month ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: malayparida2000 Once this PR has been reviewed and has the lgtm label, please assign nbalacha for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/red-hat-storage/ocs-operator/blob/main/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
malayparida2000 commented 1 month ago

This also refactors the code to use TSCs in-place of Pod affinity/anti-affinity for ceph daemons like mgr, mon, mds, rgw, nfs.

While I would expect the TSCs to work in place of the pod anti-affinity, it also feels like it has a bit of risk. Could you confirm some manual testing with this before merge? And we will want to point out to QE to watch for placement issues in case of regression.

Alternatively, it seems lower risk to always append our pod anti-affinity to the user's placement, instead of replacing it with a TSC.

Considering the high risk nature of this I am thinking of not taking this to 4.17 but keeping it in 4.18 only but nothing concrete yet. I think TSCs satisfy our requirement of even distribution perfectly & they are a cleaner approach then pod anti-affinity. So weighing on using TSCs always.

malayparida2000 commented 1 month ago

/hold

openshift-ci[bot] commented 1 month ago

@malayparida2000: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/ocs-operator-bundle-e2e-aws ac0d6c36f0355ad4538474cd0ddc15541df0bd7f link true /test ocs-operator-bundle-e2e-aws

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).