redhat-cop / namespace-configuration-operator

The namespace-configuration-operator helps keeping configurations related to Users, Groups and Namespaces aligned with one of more policies specified as a CRs
Apache License 2.0
204 stars 55 forks source link

Need support for excludedPaths and image refs #82

Closed thedulus closed 3 years ago

thedulus commented 3 years ago

I use the namespace-configuration-operator to deploy jenkins instances to namespaces with a 'build' label.

Unfortunately the namespace-configuration-operator tries to update the deployment config's image field (or gets triggered because this field was updated by OpenShift). This leads to endless reconciliation attempts:

{"level":"info","ts":1605865411.669251,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"controller_locked_object_apps.openshift.io/v1/DeploymentConfig/cld-432/jenkins","source":"kind source: apps.openshift.io/v1, Kind=DeploymentConfig"}
{"level":"info","ts":1605865411.6842124,"logger":"namespace-config-operator","msg":"Reconciling NamespaceConfig","Request.Namespace":"","Request.Name":"namespace-configuration-global"}
{"level":"info","ts":1605865411.6952496,"logger":"namespace-config-operator","msg":"Reconciling NamespaceConfig","Request.Namespace":"","Request.Name":"namespace-configuration-run"}
{"level":"info","ts":1605865411.7696338,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"controller_locked_object_apps.openshift.io/v1/DeploymentConfig/cld-432/jenkins","source":"channel source: 0xc003a800a0"}
{"level":"info","ts":1605865411.7696893,"logger":"controller-runtime.controller","msg":"Starting Controller","controller":"controller_locked_object_apps.openshift.io/v1/DeploymentConfig/cld-432/jenkins"}
{"level":"info","ts":1605865411.7696989,"logger":"controller-runtime.controller","msg":"Starting workers","controller":"controller_locked_object_apps.openshift.io/v1/DeploymentConfig/cld-432/jenkins","worker count":1}
{"level":"info","ts":1605865411.7697773,"logger":"lockedresourcecontroller","msg":"reconcile called for","object":"apps.openshift.io/v1/DeploymentConfig/cld-432/jenkins","request":"cld-432/jenkins"}
{"level":"info","ts":1605865411.8084157,"logger":"lockedresourcecontroller","msg":"reconcile called for","object":"apps.openshift.io/v1/DeploymentConfig/cld-432/jenkins","request":"cld-432/jenkins"}
{"level":"info","ts":1605865411.8084657,"logger":"namespace-config-operator","msg":"Reconciling NamespaceConfig","Request.Namespace":"","Request.Name":"namespace-configuration-jenkins"}
{"level":"info","ts":1605865411.833293,"logger":"namespace-config-operator","msg":"Reconciling NamespaceConfig","Request.Namespace":"","Request.Name":"namespace-configuration-jenkins"}
{"level":"info","ts":1605865412.7022667,"logger":"lockedresourcecontroller","msg":"reconcile called for","object":"apps.openshift.io/v1/DeploymentConfig/cld-432/jenkins","request":"cld-432/jenkins"}
{"level":"info","ts":1605865412.734059,"logger":"lockedresourcecontroller","msg":"reconcile called for","object":"apps.openshift.io/v1/DeploymentConfig/cld-432/jenkins","request":"cld-432/jenkins"}
{"level":"info","ts":1605865412.7342825,"logger":"namespace-config-operator","msg":"Reconciling NamespaceConfig","Request.Namespace":"","Request.Name":"namespace-configuration-jenkins"}
{"level":"info","ts":1605865412.764234,"logger":"namespace-config-operator","msg":"Reconciling NamespaceConfig","Request.Namespace":"","Request.Name":"namespace-configuration-jenkins"}
{"level":"info","ts":1605865412.915381,"logger":"lockedresourcecontroller","msg":"reconcile called for","object":"apps.openshift.io/v1/DeploymentConfig/cld-432/jenkins","request":"cld-432/jenkins"}
{"level":"info","ts":1605865412.948109,"logger":"lockedresourcecontroller","msg":"reconcile called for","object":"apps.openshift.io/v1/DeploymentConfig/cld-432/jenkins","request":"cld-432/jenkins"}
{"level":"info","ts":1605865412.9482353,"logger":"namespace-config-operator","msg":"Reconciling NamespaceConfig","Request.Namespace":"","Request.Name":"namespace-configuration-jenkins"}
{"level":"info","ts":1605865412.9742682,"logger":"lockedresourcecontroller","msg":"reconcile called for","object":"apps.openshift.io/v1/DeploymentConfig/cld-432/jenkins","request":"cld-432/jenkins"}
{"level":"info","ts":1605865412.9746716,"logger":"namespace-config-operator","msg":"Reconciling NamespaceConfig","Request.Namespace":"","Request.Name":"namespace-configuration-jenkins"}
{"level":"info","ts":1605865413.011987,"logger":"namespace-config-operator","msg":"Reconciling NamespaceConfig","Request.Namespace":"","Request.Name":"namespace-configuration-jenkins"}
{"level":"info","ts":1605865413.9748347,"logger":"lockedresourcecontroller","msg":"reconcile called for","object":"apps.openshift.io/v1/DeploymentConfig/cld-432/jenkins","request":"cld-432/jenkins"}
{"level":"info","ts":1605865414.007038,"logger":"lockedresourcecontroller","msg":"reconcile called for","object":"apps.openshift.io/v1/DeploymentConfig/cld-432/jenkins","request":"cld-432/jenkins"}
{"level":"info","ts":1605865414.0072663,"logger":"namespace-config-operator","msg":"Reconciling NamespaceConfig","Request.Namespace":"","Request.Name":"namespace-configuration-jenkins"}
{"level":"info","ts":1605865414.0354214,"logger":"namespace-config-operator","msg":"Reconciling NamespaceConfig","Request.Namespace":"","Request.Name":"namespace-configuration-jenkins"}

[this goes on forever...]

This is the namespace configuration object:

apiVersion: redhatcop.redhat.io/v1alpha1
kind: NamespaceConfig
metadata:
  name: namespace-configuration-jenkins
spec:
  labelSelector:
    matchLabels:
      flexis.io/namespace-type: jenkins-test
  templates:
    - objectTemplate: |
        apiVersion: apps.openshift.io/v1
        kind: DeploymentConfig
        metadata:
          annotations:
            template.alpha.openshift.io/wait-for-ready: "true"
          name: jenkins
          namespace: {{ .Name }}
        spec:
          replicas: 1
          selector:
            name: jenkins
          strategy:
            type: Recreate
          template:
            metadata:
              labels:
                name: jenkins
            spec:
              containers:
              - env: [redacted]
                image: ' '
                imagePullPolicy: IfNotPresent
                livenessProbe:
                  failureThreshold: 2
                  httpGet:
                    path: /login
                    port: 8080
                  initialDelaySeconds: 420
                  periodSeconds: 360
                  timeoutSeconds: 240
                name: jenkins
                readinessProbe:
                  httpGet:
                    path: /login
                    port: 8080
                  initialDelaySeconds: 3
                  timeoutSeconds: 240
                resources:
                  requests:
                    cpu: 100m
                    memory: 1Gi
                  limits:
                    cpu: 1
                    memory: 2Gi
                securityContext:
                  capabilities: {}
                  privileged: false
                terminationMessagePath: /dev/termination-log
                volumeMounts:
                - mountPath: /var/lib/jenkins
                  name: jenkins-data
                - mountPath: /etc/pki/ca-trust/source/anchors
                  name: jenkins-trusted-ca-bundle
              dnsPolicy: ClusterFirst
              restartPolicy: Always
              serviceAccountName: jenkins
              serviceAccount: jenkins
              volumes:
              - name: jenkins-data
                persistentVolumeClaim:
                  claimName: jenkins
              - configMap:
                  name: jenkins-trusted-ca-bundle
                  optional: true
                name: jenkins-trusted-ca-bundle
          triggers:
          - imageChangeParams:
              automatic: true
              containerNames:
              - jenkins
              from:
                kind: ImageStreamTag
                name: jenkins:latest
                namespace: jenkins
            type: ImageChange
          - type: ConfigChange

The image ref in .spec.templates.spec.containers is intentionally left blank, because I want the ImageChange trigger to handle it. As soon as I set a defined image ref to this field, the endless reconciliation attempts stop unless I push a new image and the ImageChange trigger updates the image field to another new value.

From my understanding of the documentation, this is what the excludedPath array is for. I tried all kind of jsonpaths, but so far without success.

Some examples which did not work:

excludedPaths:
- .spec
- ..image
- .spec.template.spec.containers.*.image

Maybe someone could give me the right directions here.

cnuland commented 3 years ago

@denisj07 which version of the operator are you using and on what platform? (OCP, k8s, etc)

thedulus commented 3 years ago

@cnuland the operator is on version 0.2.5 running on OpenShift Container Platform 4.5

raffaelespazzoli commented 3 years ago

@denisj07 updating the image field is a feature (?) of openshift deplymentconfig, you can easily work around it by switching to Deployment The excluded path should work with json standard path expressions, so in your case the expression would look like this: .spec.template.spec.containers[n].image where n is the container for which you want the image to be excluded (counting from 0). You can also do .spec.template.spec.containers[*].image, but I have never tested this operator with an expression that can in theory return multiple values.

thedulus commented 3 years ago

@raffaelespazzoli thanks for your hints! Unfortunately switching to a Deployment is not an option as we rely on the image change trigger feature of OpenShift's DeploymentConfig.

I spent my last few hours trying all different kind of jsonpath combinations. But they don't seem to work. Instead this time I definitely got the proof that there is a race condition between the openshit-controller-manager and the namespace-configuration-operator:

kind: DeploymentConfig
apiVersion: apps.openshift.io/v1
metadata:
  annotations:
    template.alpha.openshift.io/wait-for-ready: 'true'
  selfLink: /apis/apps.openshift.io/v1/namespaces/cld-432/deploymentconfigs/jenkins
- resourceVersion: '150528382'
+ resourceVersion: '150528547'
  name: jenkins
  uid: f0175f73-aaf2-442e-96a6-bc6b763247ac
  creationTimestamp: '2021-01-13T15:03:25Z'
- generation: 3725
+ generation: 3764
  managedFields:
    - manager: namespace-configuration-operator
      operation: Update
      apiVersion: apps.openshift.io/v1
-     time: '2021-01-13T15:56:39Z'
+     time: '2021-01-13T15:56:50Z'
      fieldsType: FieldsV1
      fieldsV1:
[...]
        'f:spec':
[...]
          'f:template':
[...]
            'f:spec':
              'f:containers':
                .: {}
                'k:{"name":"jenkins"}':
+                 'f:image': {}
[...]
+         'f:triggers': {}
        'f:status':
          'f:availableReplicas': {}
          'f:latestVersion': {}
          'f:replicas': {}
          'f:updatedReplicas': {}
    - manager: openshift-controller-manager
      operation: Update
      apiVersion: apps.openshift.io/v1
-     time: '2021-01-13T15:56:39Z'
+     time: '2021-01-13T15:56:51Z'
      fieldsType: FieldsV1
      fieldsV1:
-       'f:spec':
-         'f:template':
-           'f:spec':
-             'f:containers':
-               'k:{"name":"jenkins"}':
-                 'f:image': {}
-         'f:triggers': {}
        'f:status':
          'f:observedGeneration': {}
          'f:unavailableReplicas': {}
[...]
spec:
  strategy:
    type: Recreate
    recreateParams:
      timeoutSeconds: 600
    resources: {}
    activeDeadlineSeconds: 21600
  triggers:
    - type: ImageChange
      imageChangeParams:
        automatic: true
        containerNames:
          - jenkins
        from:
          kind: ImageStreamTag
          namespace: jenkins
          name: 'jenkins:latest'
-       lastTriggeredImage: >-
-         image-registry.openshift-image-registry.svc:5000/jenkins/jenkins@sha256:4bdad09b18e49064a5e043a7788c30185be9f0e50fc33beaf04a146dc43c53c2
template:
    metadata:
[...]
    spec:
[...]
      containers:
-       - image: >-
-           image-registry.openshift-image-registry.svc:5000/jenkins/jenkins@sha256:4bdad09b18e49064a5e043a7788c30185be9f0e50fc33beaf04a146dc43c53c2
+       - image: ' '
[...]
status:
- observedGeneration: 2569
+ observedGeneration: 2609

These fields are changed by both, OpenShift and the operator, every second until forever (also puts a lot of load on the cluster...). My currently used excludedPaths:

      excludedPaths:
      - .spec.template.spec.containers[0].image
      - .spec.triggers[0].imageChangeParams.lastTriggeredImage
      - .spec.template
      - .spec.triggers

Even .spec.template and .spec.triggers seem to be ignored by the operator. And my understanding is that they should definitely exclude the two imageRef fields?

cnuland commented 3 years ago

@denisj07 could you see if the annotation trigger for the Deployment resource type works for you? https://developers.redhat.com/blog/2019/09/20/using-red-hat-openshift-image-streams-with-kubernetes-deployments/

I'm curious to see if that prevents the issue, if not let me know and I can try to reproduce the issue myself.

thedulus commented 3 years ago

@cnuland I've tested the Deployment type and that worked. The namespace-configuration-operator does not try to update the object anymore.

But now I have the same issue with another object... All of our Jenkins instances should have a Service of type NodePort that we use to communicate with VMs for Windows-based builds. The nodePort is set by OpenShift / Kubernetes dynamically and this time our namespace-configuration-operator is not happy with that specific difference between definition and the actual object.

This is my service objectTemplate:

    - objectTemplate: |
        apiVersion: v1
        kind: Service
        metadata:
          name: jenkins-jnlp
          namespace: {{ .Name }}
        spec:
          ports:
          - name: agent
            port: 50000
            protocol: TCP
            targetPort: 50000
          selector:
            app: jenkins
          sessionAffinity: None
          type: NodePort
      excludedPaths:
        - .spec.ports.[0].nodePort

that translates into:

        apiVersion: v1
        kind: Service
        metadata:
          name: jenkins-jnlp
          namespace: {{ .Name }}
        spec:
          ports:
          - name: agent
-
+           nodePort: 30498
            port: 50000
            protocol: TCP
            targetPort: 50000
          selector:
            app: jenkins
          sessionAffinity: None
          type: NodePort

The operator removes this nodePort attribute which gets added back by OpenShift in an endless loop which results in a perfect random number generator ;-)

A short extract from the logs:

{"level":"info","ts":1611928464.0998008,"logger":"lockedresourcecontroller","msg":"reconcile called for","object":"v1/Service/cld-432/jenkins-jnlp","request":"cld-432/jenkins-jnlp"}
{"level":"info","ts":1611928464.1124558,"logger":"namespace-config-operator","msg":"Reconciling NamespaceConfig","Request.Namespace":"","Request.Name":"namespace-configuration-jenkins"}
{"level":"info","ts":1611928464.1203876,"logger":"lockedresourcecontroller","msg":"reconcile called for","object":"v1/Service/cld-432/jenkins-jnlp","request":"cld-432/jenkins-jnlp"}
{"level":"info","ts":1611928464.1306062,"logger":"namespace-config-operator","msg":"Reconciling NamespaceConfig","Request.Namespace":"","Request.Name":"namespace-configuration-jenkins"}
{"level":"info","ts":1611928464.1413977,"logger":"lockedresourcecontroller","msg":"reconcile called for","object":"v1/Service/cld-432/jenkins-jnlp","request":"cld-432/jenkins-jnlp"}
{"level":"info","ts":1611928464.145114,"logger":"namespace-config-operator","msg":"Reconciling NamespaceConfig","Request.Namespace":"","Request.Name":"namespace-configuration-jenkins"}
{"level":"info","ts":1611928464.1700182,"logger":"lockedresourcecontroller","msg":"reconcile called for","object":"v1/Service/cld-432/jenkins-jnlp","request":"cld-432/jenkins-jnlp"}
{"level":"info","ts":1611928464.1701834,"logger":"namespace-config-operator","msg":"Reconciling NamespaceConfig","Request.Namespace":"","Request.Name":"namespace-configuration-jenkins"}
{"level":"info","ts":1611928464.1880972,"logger":"lockedresourcecontroller","msg":"reconcile called for","object":"v1/Service/cld-432/jenkins-jnlp","request":"cld-432/jenkins-jnlp"}
{"level":"info","ts":1611928464.188168,"logger":"namespace-config-operator","msg":"Reconciling NamespaceConfig","Request.Namespace":"","Request.Name":"namespace-configuration-jenkins"}
cnuland commented 3 years ago

@denisj07 confirmed this is an issue, you can do this declaratively by just setting the nodePort, but I'm guessing you want the randomness since then only one service can be in use. I'll let you know when there's a fix to this issue in a future release.

cnuland commented 3 years ago

@denisj07 can you try updating to version 1.0.1 and see if you see the same issue? I'm unable to reproduce after we made some fixes in that version.

thedulus commented 3 years ago

@cnuland Unfortunately I had to reinstall the operator, as OLM didn't want to do that automatically. I'm not quite sure but I think this is due to the changes in the available installModes (Type: OwnNamespace / Type: SingleNamspace from version < 1.0.0 was changed to Type: AllNamespaces). My guess is that it wanted the Operator to be installed in the openshift-operators namespace which was not the case.

So now the operator is on version 1.0.1 but it's still trying to update the Service object indefinitely because of the nodePort attribute that gets added by OpenShift automatically. My current template definition is:

- objectTemplate: |
    apiVersion: v1
    kind: Service
    metadata:
      name: jenkins-jnlp
      namespace: {{ .Name }}
    spec:
      ports:
        - name: agent
          protocol: TCP
          port: 50000
          targetPort: 50000
      selector:
        app: jenkins
      sessionAffinity: None
      type: LoadBalancer
  excludedPaths:
  - .spec.clusterIP
  - .spec.externalIPs
  - .spec.ports[0].nodePort
cnuland commented 3 years ago

@denisj07 can you check to see if version 1.0.3 is available now on operator hub? I believe the fix should be within that version based on my tests.

cnuland commented 3 years ago

Haven't heard any input on this in a while and can confirm this issue was fixed in 1.0.3, if you continue to have issues let me know.