stackabletech / issues

This repository is only for issues that concern multiple repositories or don't fit into any specific repository
2 stars 0 forks source link

Support tolerations for OLM daemonsets #644

Open razvan opened 1 month ago

razvan commented 1 month ago

Description

Investigate how to support toleration for operators deployed in OpenShift evnironments using OLM.

Especially look into supporting tolerations for operators that contain DaemonSets, such as the secret and listener operator.

Nice to have: in addition to tolerations also consider other possible configurations such as images, resources, etc.

Links

razvan commented 2 weeks ago

How to test

  1. Prepare an fresh OKD cluster with two worker nodes and taint the one that is not also the control plane.
# create cluster
$ replicated cluster create --name rami --distribution openshift --instance-type r1.large --nodes 2 --version 4.15.0-okd --ttl 6h  --disk 50

# retrieve cluster configuration
$ replicated cluster kubeconfig --name rami

# disable community-operators as it's broken and blocks the installation of other operators
$ kubectl patch operatorhubs/cluster --type merge --patch '{"spec":{"sources":[{"disabled": true,"name": "community-operators"}]}}'

# taint the worker node (not the control plane)
$ kubectl taint nodes worker-au377xjt keep-out=yes:NoSchedule
  1. Checkout the branch feat/tolerations in the openshift-certified-tolerations repository. This branch modifies the existing OLM manifests to copy subscription tolerations into the daemonsets of the secret and listener operators. See also diff below.

NOTE: this branch should not be merged into main as it's not possible to alter existing certified operators in the "certified" catalog.

  1. Checkout this PR from stackable utils.

  2. Install the secret and listener operators:

$ /olm/build-bundles.sh -r 24.7.0 -o secret -c $HOME/repo/stackable/openshift-certified-operators -d
$ /olm/build-bundles.sh -r 24.7.0 -o listener -c $HOME/repo/stackable/openshift-certified-operators -d
  1. Check that both operators have Pods running on the tainted worker node as well as the control plane.

  2. Delete cluster

$ replicated cluster rm --name rami
  1. Done.

OLM manifests diff

diff --git a/operators/stackable-listener-operator/24.7.0/manifests/listener-operator-manifests.yaml b/operators/stackable-listener-operator/24.7.0/manifests/listener-operator-manifests.yaml
index b8c635bd..d750ae97 100644
--- a/operators/stackable-listener-operator/24.7.0/manifests/listener-operator-manifests.yaml
+++ b/operators/stackable-listener-operator/24.7.0/manifests/listener-operator-manifests.yaml
@@ -15,6 +15,12 @@ data:
     -o name | head -n 1 | sed 's/^.*\///')
     export LISTENER_OPERATOR_CLUSTERROLE_UID=$(kubectl get clusterrole $LISTENER_OPERATOR_CLUSTERROLE_NAME -o go-template='{{.metadata.uid}}')

+    #
+    # Patch the daemonset manifest with any tolerations from the subscription.
+    #
+    SUBSCRIPTION_TOLERATIONS=$(kubectl get deployment secret-operator-deployer -o jsonpath='{.spec.template.spec.tolerations}')
+    export SUBSCRIPTION_TOLERATIONS="$SUBSCRIPTION_TOLERATIONS"
+
     for file in /manifests/*.yaml; do
       echo Deploying $file
       cat $file | envsubst | ./kubectl apply --wait -f -
@@ -150,6 +156,7 @@ data:
             - name: mountpoint
               hostPath:
                 path: /var/lib/kubelet/pods/
+          tolerations: $SUBSCRIPTION_TOLERATIONS
   cluster-internal.yaml: |
     ---
     apiVersion: listeners.stackable.tech/v1alpha1
diff --git a/operators/stackable-secret-operator/24.7.0/manifests/secret-operator-manifests.yaml b/operators/stackable-secret-operator/24.7.0/manifests/secret-operator-manifests.yaml
index 87856495..9d22193c 100644
--- a/operators/stackable-secret-operator/24.7.0/manifests/secret-operator-manifests.yaml
+++ b/operators/stackable-secret-operator/24.7.0/manifests/secret-operator-manifests.yaml
@@ -21,6 +21,15 @@ data:
     SECRET_OPERATOR_CLUSTERROLE_UID=$(kubectl get clusterrole "$SECRET_OPERATOR_CLUSTERROLE_NAME" -o go-template='{{.metadata.uid}}')
     export SECRET_OPERATOR_CLUSTERROLE_UID="$SECRET_OPERATOR_CLUSTERROLE_UID"

+    #
+    # Patch the daemonset manifest with any tolerations from the subscription.
+    #
+    SUBSCRIPTION_TOLERATIONS=$(kubectl get deployment secret-operator-deployer -o jsonpath='{.spec.template.spec.tolerations}')
+    export SUBSCRIPTION_TOLERATIONS="$SUBSCRIPTION_TOLERATIONS"
+
+    #
+    # Create secret op objects
+    #
     for file in /manifests/*.yaml; do
       echo Deploying $file
       cat $file | envsubst | ./kubectl apply --wait -f -
@@ -222,3 +231,4 @@ data:
                 path: /var/lib/kubelet/pods/
             - name: tmp
               emptyDir: {}
+          tolerations: $SUBSCRIPTION_TOLERATIONS