Closed thedulus closed 3 years ago
@denisj07 which version of the operator are you using and on what platform? (OCP, k8s, etc)
@cnuland the operator is on version 0.2.5 running on OpenShift Container Platform 4.5
@denisj07 updating the image field is a feature (?) of openshift deplymentconfig, you can easily work around it by switching to Deployment
The excluded path should work with json standard path expressions, so in your case the expression would look like this:
.spec.template.spec.containers[n].image
where n is the container for which you want the image to be excluded (counting from 0).
You can also do .spec.template.spec.containers[*].image
, but I have never tested this operator with an expression that can in theory return multiple values.
@raffaelespazzoli thanks for your hints! Unfortunately switching to a Deployment is not an option as we rely on the image change trigger feature of OpenShift's DeploymentConfig.
I spent my last few hours trying all different kind of jsonpath combinations. But they don't seem to work. Instead this time I definitely got the proof that there is a race condition between the openshit-controller-manager and the namespace-configuration-operator:
kind: DeploymentConfig
apiVersion: apps.openshift.io/v1
metadata:
annotations:
template.alpha.openshift.io/wait-for-ready: 'true'
selfLink: /apis/apps.openshift.io/v1/namespaces/cld-432/deploymentconfigs/jenkins
- resourceVersion: '150528382'
+ resourceVersion: '150528547'
name: jenkins
uid: f0175f73-aaf2-442e-96a6-bc6b763247ac
creationTimestamp: '2021-01-13T15:03:25Z'
- generation: 3725
+ generation: 3764
managedFields:
- manager: namespace-configuration-operator
operation: Update
apiVersion: apps.openshift.io/v1
- time: '2021-01-13T15:56:39Z'
+ time: '2021-01-13T15:56:50Z'
fieldsType: FieldsV1
fieldsV1:
[...]
'f:spec':
[...]
'f:template':
[...]
'f:spec':
'f:containers':
.: {}
'k:{"name":"jenkins"}':
+ 'f:image': {}
[...]
+ 'f:triggers': {}
'f:status':
'f:availableReplicas': {}
'f:latestVersion': {}
'f:replicas': {}
'f:updatedReplicas': {}
- manager: openshift-controller-manager
operation: Update
apiVersion: apps.openshift.io/v1
- time: '2021-01-13T15:56:39Z'
+ time: '2021-01-13T15:56:51Z'
fieldsType: FieldsV1
fieldsV1:
- 'f:spec':
- 'f:template':
- 'f:spec':
- 'f:containers':
- 'k:{"name":"jenkins"}':
- 'f:image': {}
- 'f:triggers': {}
'f:status':
'f:observedGeneration': {}
'f:unavailableReplicas': {}
[...]
spec:
strategy:
type: Recreate
recreateParams:
timeoutSeconds: 600
resources: {}
activeDeadlineSeconds: 21600
triggers:
- type: ImageChange
imageChangeParams:
automatic: true
containerNames:
- jenkins
from:
kind: ImageStreamTag
namespace: jenkins
name: 'jenkins:latest'
- lastTriggeredImage: >-
- image-registry.openshift-image-registry.svc:5000/jenkins/jenkins@sha256:4bdad09b18e49064a5e043a7788c30185be9f0e50fc33beaf04a146dc43c53c2
template:
metadata:
[...]
spec:
[...]
containers:
- - image: >-
- image-registry.openshift-image-registry.svc:5000/jenkins/jenkins@sha256:4bdad09b18e49064a5e043a7788c30185be9f0e50fc33beaf04a146dc43c53c2
+ - image: ' '
[...]
status:
- observedGeneration: 2569
+ observedGeneration: 2609
These fields are changed by both, OpenShift and the operator, every second until forever (also puts a lot of load on the cluster...). My currently used excludedPaths:
excludedPaths:
- .spec.template.spec.containers[0].image
- .spec.triggers[0].imageChangeParams.lastTriggeredImage
- .spec.template
- .spec.triggers
Even .spec.template
and .spec.triggers
seem to be ignored by the operator. And my understanding is that they should definitely exclude the two imageRef fields?
@denisj07 could you see if the annotation trigger for the Deployment
resource type works for you?
https://developers.redhat.com/blog/2019/09/20/using-red-hat-openshift-image-streams-with-kubernetes-deployments/
I'm curious to see if that prevents the issue, if not let me know and I can try to reproduce the issue myself.
@cnuland I've tested the Deployment
type and that worked. The namespace-configuration-operator does not try to update the object anymore.
But now I have the same issue with another object... All of our Jenkins instances should have a Service of type NodePort
that we use to communicate with VMs for Windows-based builds. The nodePort is set by OpenShift / Kubernetes dynamically and this time our namespace-configuration-operator is not happy with that specific difference between definition and the actual object.
This is my service objectTemplate:
- objectTemplate: |
apiVersion: v1
kind: Service
metadata:
name: jenkins-jnlp
namespace: {{ .Name }}
spec:
ports:
- name: agent
port: 50000
protocol: TCP
targetPort: 50000
selector:
app: jenkins
sessionAffinity: None
type: NodePort
excludedPaths:
- .spec.ports.[0].nodePort
that translates into:
apiVersion: v1
kind: Service
metadata:
name: jenkins-jnlp
namespace: {{ .Name }}
spec:
ports:
- name: agent
-
+ nodePort: 30498
port: 50000
protocol: TCP
targetPort: 50000
selector:
app: jenkins
sessionAffinity: None
type: NodePort
The operator removes this nodePort attribute which gets added back by OpenShift in an endless loop which results in a perfect random number generator ;-)
A short extract from the logs:
{"level":"info","ts":1611928464.0998008,"logger":"lockedresourcecontroller","msg":"reconcile called for","object":"v1/Service/cld-432/jenkins-jnlp","request":"cld-432/jenkins-jnlp"}
{"level":"info","ts":1611928464.1124558,"logger":"namespace-config-operator","msg":"Reconciling NamespaceConfig","Request.Namespace":"","Request.Name":"namespace-configuration-jenkins"}
{"level":"info","ts":1611928464.1203876,"logger":"lockedresourcecontroller","msg":"reconcile called for","object":"v1/Service/cld-432/jenkins-jnlp","request":"cld-432/jenkins-jnlp"}
{"level":"info","ts":1611928464.1306062,"logger":"namespace-config-operator","msg":"Reconciling NamespaceConfig","Request.Namespace":"","Request.Name":"namespace-configuration-jenkins"}
{"level":"info","ts":1611928464.1413977,"logger":"lockedresourcecontroller","msg":"reconcile called for","object":"v1/Service/cld-432/jenkins-jnlp","request":"cld-432/jenkins-jnlp"}
{"level":"info","ts":1611928464.145114,"logger":"namespace-config-operator","msg":"Reconciling NamespaceConfig","Request.Namespace":"","Request.Name":"namespace-configuration-jenkins"}
{"level":"info","ts":1611928464.1700182,"logger":"lockedresourcecontroller","msg":"reconcile called for","object":"v1/Service/cld-432/jenkins-jnlp","request":"cld-432/jenkins-jnlp"}
{"level":"info","ts":1611928464.1701834,"logger":"namespace-config-operator","msg":"Reconciling NamespaceConfig","Request.Namespace":"","Request.Name":"namespace-configuration-jenkins"}
{"level":"info","ts":1611928464.1880972,"logger":"lockedresourcecontroller","msg":"reconcile called for","object":"v1/Service/cld-432/jenkins-jnlp","request":"cld-432/jenkins-jnlp"}
{"level":"info","ts":1611928464.188168,"logger":"namespace-config-operator","msg":"Reconciling NamespaceConfig","Request.Namespace":"","Request.Name":"namespace-configuration-jenkins"}
@denisj07 confirmed this is an issue, you can do this declaratively by just setting the nodePort, but I'm guessing you want the randomness since then only one service can be in use. I'll let you know when there's a fix to this issue in a future release.
@denisj07 can you try updating to version 1.0.1 and see if you see the same issue? I'm unable to reproduce after we made some fixes in that version.
@cnuland Unfortunately I had to reinstall the operator, as OLM didn't want to do that automatically. I'm not quite sure but I think this is due to the changes in the available installModes
(Type: OwnNamespace
/ Type: SingleNamspace
from version < 1.0.0 was changed to Type: AllNamespaces
). My guess is that it wanted the Operator to be installed in the openshift-operators
namespace which was not the case.
So now the operator is on version 1.0.1 but it's still trying to update the Service object indefinitely because of the nodePort
attribute that gets added by OpenShift automatically. My current template definition is:
- objectTemplate: |
apiVersion: v1
kind: Service
metadata:
name: jenkins-jnlp
namespace: {{ .Name }}
spec:
ports:
- name: agent
protocol: TCP
port: 50000
targetPort: 50000
selector:
app: jenkins
sessionAffinity: None
type: LoadBalancer
excludedPaths:
- .spec.clusterIP
- .spec.externalIPs
- .spec.ports[0].nodePort
@denisj07 can you check to see if version 1.0.3 is available now on operator hub? I believe the fix should be within that version based on my tests.
Haven't heard any input on this in a while and can confirm this issue was fixed in 1.0.3, if you continue to have issues let me know.
I use the namespace-configuration-operator to deploy jenkins instances to namespaces with a 'build' label.
Unfortunately the namespace-configuration-operator tries to update the deployment config's image field (or gets triggered because this field was updated by OpenShift). This leads to endless reconciliation attempts:
This is the namespace configuration object:
The image ref in
.spec.templates.spec.containers
is intentionally left blank, because I want the ImageChange trigger to handle it. As soon as I set a defined image ref to this field, the endless reconciliation attempts stop unless I push a new image and the ImageChange trigger updates the image field to another new value.From my understanding of the documentation, this is what the excludedPath array is for. I tried all kind of jsonpaths, but so far without success.
Some examples which did not work:
Maybe someone could give me the right directions here.