Open clemensutschig opened 3 years ago
@michaelsauter can you take a look into this, this is pretty critical
It's weird, because we honor this case in https://github.com/opendevstack/ods-jenkins-shared-library/blob/3.x/src/org/ods/services/OpenShiftService.groovy#L195-L197 - but somehow this does NOT seem to work?!
@jorge-romero @metmajer - this is a blocker for 4 ...
I am inclined to put a while loop around the getLatestVersion
- this should never fail ..
Do you have logs prior to what is shown above? I am slightly confused by the output as it seems to continue after error: #11 is already in progress
? I wonder on which command it actually fails ... it looks like https://github.com/opendevstack/ods-jenkins-shared-library/blob/3.x/src/org/ods/services/OpenShiftService.groovy#L213 fails ... which I don't get?
Actually, maybe the failure is from https://github.com/opendevstack/ods-jenkins-shared-library/blob/3.x/src/org/ods/services/OpenShiftService.groovy#L199. That would occur if the rollout is running already (which it is, looking at the logs) but somehow status.latestVersion
is not greater than the version passed.
@michaelsauter same here, and it's bubbling - and stopping .. (there is no catch, except on the most outer layer)
status.latestVersion
is what I am expecting as well (as the rollout is still running
) ...
my theory: https://github.com/opendevstack/ods-jenkins-shared-library/blob/3.x/src/org/ods/orchestration/phases/DeployOdsComponent.groovy#L48 is a loop, somehow in one of the iterations the "prior version" is already the "new version"?
with the 2 images that are imported and cancelled (deployments) versions - that could be indeed the case... hmmm
what if we were to check if there is a deployment running - cancel that, and then rollout ourselves? .. just thoughts ...
the logic - sort of latest+1 does not seem to work as we hoped it would
the other option I could think of - in case of an exception - is to verify the containers, and if they have the latest sha's skip the err?
Actually, what about passing the priorVersion
into https://github.com/opendevstack/ods-jenkins-shared-library/blob/3.x/src/org/ods/services/OpenShiftService.groovy#L154? The prior version is collected BEFORE we import images or change any config by applying templates. Wouldn't this priorVersion
be "enough" as a sanity check in startRollout
if oc rollout
fails there?
Still, I do not understand the failure. Can you share the deployment descriptor file in use? And the triggers set on the DC?
3 triggers on the DC .. config change and image change (for both images) - so in reality you can get 3 deployments
a) tailor import b) image 1 c) image 2
I like the idea of "just" checking whether it's > then the prio image we get ..
the deployments.json
looks as follows
{
"deployments": {
"front-back": {
"containers": {
"front-back-frontend": "front-back-frontend@sha256:b711ac347ac72aba428d717b9988949faccc83fc47cfd00a3342c2752d96c213",
"front-back-backend": "front-back-backend@sha256:fa7d63b6f7d98fe776d7de5d23e53a4fcaa9a95bad159de6527640abedd82537"
}
}
},
"CREATED_BY_BUILD": "front-0.19.0/55"
}
deployment config in question:
apiVersion: v1
kind: ReplicationController
metadata:
annotations:
openshift.io/deployer-pod.completed-at: '2021-07-12 16:48:19 +0200 CEST'
openshift.io/deployer-pod.created-at: '2021-07-12 15:11:35 +0200 CEST'
openshift.io/deployer-pod.name: front-back-11-deploy
openshift.io/deployment-config.latest-version: '11'
openshift.io/deployment-config.name: front-back
openshift.io/deployment.phase: Complete
openshift.io/deployment.replicas: ''
openshift.io/deployment.status-reason: image change
openshift.io/encoded-deployment-config: >
{"kind":"DeploymentConfig","apiVersion":"apps.openshift.io/v1","metadata":{"name":"front-back","namespace":"gihkw1-prod","selfLink":"/apis/apps.openshift.io/v1/namespaces/gihkw1-prod/deploymentconfigs/front-back","uid":"837c70c3-a0e6-11eb-84af-0050569e7b02","resourceVersion":"410791490","generation":14,"creationTimestamp":"2021-04-19T08:09:07Z","labels":{"app":"gihkw1-front-back","template":"monorepo-component-template"},"annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"apps.openshift.io/v1\",\"kind\":\"DeploymentConfig\",\"metadata\":{\"annotations\":{},\"labels\":{\"app\":\"gihkw1-front-back\",\"template\":\"monorepo-component-template\"},\"name\":\"front-back\",\"namespace\":\"gihkw1-prod\"},\"spec\":{\"replicas\":1,\"revisionHistoryLimit\":10,\"selector\":{\"app\":\"gihkw1-front-back\",\"deploymentconfig\":\"front-back\"},\"strategy\":{\"activeDeadlineSeconds\":21600,\"resources\":{},\"rollingParams\":{\"intervalSeconds\":1,\"maxSurge\":\"25%\",\"maxUnavailable\":\"25%\",\"timeoutSeconds\":600,\"updatePeriodSeconds\":1},\"type\":\"Rolling\"},\"template\":{\"metadata\":{\"labels\":{\"app\":\"gihkw1-front-back\",\"deploymentconfig\":\"front-back\",\"env\":\"dev\"}},\"spec\":{\"containers\":[{\"image\":\"gihkw1-prod/front-back-frontend:latest\",\"imagePullPolicy\":\"IfNotPresent\",\"name\":\"front-back-frontend\",\"ports\":[{\"containerPort\":8080,\"protocol\":\"TCP\"}],\"resources\":{\"limits\":{\"cpu\":\"100m\",\"memory\":\"128Mi\"},\"requests\":{\"cpu\":\"50m\",\"memory\":\"128Mi\"}},\"terminationMessagePath\":\"/dev/termination-log\",\"terminationMessagePolicy\":\"File\"},{\"image\":\"gihkw1-prod/front-back-backend:latest\",\"imagePullPolicy\":\"IfNotPresent\",\"name\":\"front-back-backend\",\"ports\":[{\"containerPort\":8081,\"protocol\":\"TCP\"}],\"resources\":{\"limits\":{\"cpu\":\"100m\",\"memory\":\"128Mi\"},\"requests\":{\"cpu\":\"50m\",\"memory\":\"128Mi\"}},\"terminationMessagePath\":\"/dev/termination-log\",\"terminationMessagePolicy\":\"File\"}],\"dnsPolicy\":\"ClusterFirst\",\"restartPolicy\":\"Always\",\"schedulerName\":\"default-scheduler\",\"securityContext\":{},\"terminationGracePeriodSeconds\":30}},\"test\":false,\"triggers\":[{\"type\":\"ConfigChange\"},{\"imageChangeParams\":{\"automatic\":true,\"containerNames\":[\"front-back-backend\"],\"from\":{\"kind\":\"ImageStreamTag\",\"name\":\"front-back-backend:latest\",\"namespace\":\"gihkw1-prod\"}},\"type\":\"ImageChange\"},{\"imageChangeParams\":{\"automatic\":true,\"containerNames\":[\"front-back-frontend\"],\"from\":{\"kind\":\"ImageStreamTag\",\"name\":\"front-back-frontend:latest\",\"namespace\":\"gihkw1-prod\"}},\"type\":\"ImageChange\"}]}}\n"}},"spec":{"strategy":{"type":"Rolling","rollingParams":{"updatePeriodSeconds":1,"intervalSeconds":1,"timeoutSeconds":600,"maxUnavailable":"25%","maxSurge":"25%"},"resources":{},"activeDeadlineSeconds":21600},"triggers":[{"type":"ConfigChange"},{"type":"ImageChange","imageChangeParams":{"automatic":true,"containerNames":["front-back-backend"],"from":{"kind":"ImageStreamTag","namespace":"gihkw1-prod","name":"front-back-backend:latest"},"lastTriggeredImage":"..../gihkw1-test/front-back-backend@sha256:fa7d63b6f7d98fe776d7de5d23e53a4fcaa9a95bad159de6527640abedd82537"}},{"type":"ImageChange","imageChangeParams":{"automatic":true,"containerNames":["front-back-frontend"],"from":{"kind":"ImageStreamTag","namespace":"gihkw1-prod","name":"front-back-frontend:latest"},"lastTriggeredImage":"...../gihkw1-test/front-back-frontend@sha256:b711ac347ac72aba428d717b9988949faccc83fc47cfd00a3342c2752d96c213"}}],"replicas":1,"revisionHistoryLimit":10,"test":false,"selector":{"app":"gihkw1-front-back","deploymentconfig":"front-back"},"template":{"metadata":{"creationTimestamp":null,"labels":{"app":"gihkw1-front-back","deploymentconfig":"front-back","env":"dev"}},"spec":{"containers":[{"name":"front-back-frontend","image":"....../gihkw1-test/front-back-frontend@sha256:b711ac347ac72aba428d717b9988949faccc83fc47cfd00a3342c2752d96c213","ports":[{"containerPort":8080,"protocol":"TCP"}],"resources":{"limits":{"cpu":"100m","memory":"128Mi"},"requests":{"cpu":"50m","memory":"128Mi"}},"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File","imagePullPolicy":"IfNotPresent"},{"name":"front-back-backend","image":"..../gihkw1-test/front-back-backend@sha256:fa7d63b6f7d98fe776d7de5d23e53a4fcaa9a95bad159de6527640abedd82537","ports":[{"containerPort":8081,"protocol":"TCP"}],"resources":{"limits":{"cpu":"100m","memory":"128Mi"},"requests":{"cpu":"50m","memory":"128Mi"}},"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File","imagePullPolicy":"IfNotPresent"}],"restartPolicy":"Always","terminationGracePeriodSeconds":30,"dnsPolicy":"ClusterFirst","securityContext":{},"schedulerName":"default-scheduler"}}},"status":{"latestVersion":11,"observedGeneration":13,"replicas":1,"updatedReplicas":0,"availableReplicas":1,"unavailableReplicas":0,"details":{"message":"image
change","causes":[{"type":"ImageChange","imageTrigger":{"from":{"kind":"DockerImage","name":"..../gihkw1-test/front-back-backend@sha256:fa7d63b6f7d98fe776d7de5d23e53a4fcaa9a95bad159de6527640abedd82537"}}}]},"conditions":[{"type":"Available","status":"True","lastUpdateTime":"2021-07-11T16:32:33Z","lastTransitionTime":"2021-07-11T16:32:33Z","message":"Deployment
config has minimum
availability."},{"type":"Progressing","status":"Unknown","lastUpdateTime":"2021-07-12T13:11:24Z","lastTransitionTime":"2021-07-12T13:11:24Z","message":"replication
controller \"front-back-10\" is waiting for pod \"front-back-10-deploy\"
to run"}],"readyReplicas":1}}
creationTimestamp: '2021-07-12T13:11:35Z'
generation: 2
labels:
app: gihkw1-front-back
openshift.io/deployment-config.name: front-back
template: monorepo-component-template
name: front-back-11
namespace: gihkw1-prod
ownerReferences:
- apiVersion: apps.openshift.io/v1
blockOwnerDeletion: true
controller: true
kind: DeploymentConfig
name: front-back
uid: 837c70c3-a0e6-11eb-84af-0050569e7b02
resourceVersion: '410828255'
selfLink: /api/v1/namespaces/gihkw1-prod/replicationcontrollers/front-back-11
uid: af261b81-e312-11eb-bd77-0050569e3b56
spec:
replicas: 1
selector:
app: gihkw1-front-back
deployment: front-back-11
deploymentconfig: front-back
template:
metadata:
annotations:
openshift.io/deployment-config.latest-version: '11'
openshift.io/deployment-config.name: front-back
openshift.io/deployment.name: front-back-11
creationTimestamp: null
labels:
app: gihkw1-front-back
deployment: front-back-11
deploymentconfig: front-back
env: dev
spec:
containers:
- image: >-
...../gihkw1-test/front-back-frontend@sha256:b711ac347ac72aba428d717b9988949faccc83fc47cfd00a3342c2752d96c213
imagePullPolicy: IfNotPresent
name: front-back-frontend
ports:
- containerPort: 8080
protocol: TCP
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 50m
memory: 128Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
- image: >-
...../gihkw1-test/front-back-backend@sha256:fa7d63b6f7d98fe776d7de5d23e53a4fcaa9a95bad159de6527640abedd82537
imagePullPolicy: IfNotPresent
name: front-back-backend
ports:
- containerPort: 8081
protocol: TCP
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 50m
memory: 128Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
status:
availableReplicas: 1
fullyLabeledReplicas: 1
observedGeneration: 2
readyReplicas: 1
replicas: 1
Thanks for sharing. Unfortunately I do not understand it.
Given there is only one deployment, the loop only runs once. As you say, three things might trigger a rollout, but all of them would happen AFTER we get the priorVersion
in https://github.com/opendevstack/ods-jenkins-shared-library/blob/3.x/src/org/ods/orchestration/phases/DeployOdsComponent.groovy#L44. Triggering multiple times should be OK as we only check if the number is greater, not that it is n+1.
If https://github.com/opendevstack/ods-jenkins-shared-library/blob/3.x/src/org/ods/services/OpenShiftService.groovy#L150 still returns the same version as priorVersion
, we attempt a rollout, which might fail, but then get the version again in https://github.com/opendevstack/ods-jenkins-shared-library/blob/3.x/src/org/ods/services/OpenShiftService.groovy#L195. At this point it most certainly should be updated as a rollout is definitely running. So how could it be the same version still? I fail to see a reason .... or maybe our understanding of when this value gets updated is wrong?
From https://docs.openshift.com/container-platform/3.9/rest_api/oapi/v1.DeploymentConfig.html:
A deployment is "triggered" when its configuration is changed or a tag in an Image Stream is changed. Triggers can be disabled to allow manual control over a deployment. The "strategy" determines how the deployment is carried out and may be changed at any time. The latestVersion field is updated when a new deployment is triggered by any means.
I don't think anymore that passing the priorVersion will ready help. Only additional logging will help there.
BTW, is this reproducible?
yup . it's one of those fun bugs ... :) only reproduces on the way to prod
shows the "two deployments" .. one cancelled - one rolled out ..
@michaelsauter the PR will dump information on the deployment ids .. so hopefully that also helps to diagnose this ..
@jorge-romero @s2oBCN please have a look if this bug affects our demo application
I believe @martin - that you may need quite some luck to repro this ...
Describe the bug Mono-repo with 2 images (and image triggers to :latest) fails during promote to Q/P with
script exit -1 at .rollout
step 1: import 2 images:
later we try to rollout the dc .. and it's (obviously) running already, ... (but this should be accounted for, and for some reason, still breaks)
Affected version (please complete the following information):