jenkinsci / kubernetes-operator

Kubernetes native Jenkins Operator
https://jenkinsci.github.io/kubernetes-operator
Other
596 stars 233 forks source link

Waiting for reconnection - after updating master image using operator #895

Closed JCzz closed 9 months ago

JCzz commented 11 months ago

Describe the bug After updating or patching master image, the job/project pod container jnlp is unable to reconnect with master.

To Reproduce

  1. kind create cluster

    kind create cluster
  2. Apply Operator CRDs

    kubectl apply -f https://raw.githubusercontent.com/jenkinsci/kubernetes-operator/master/config/crd/bases/jenkins.io_jenkins.yaml
  3. Apply Operator

    kubectl apply -f https://raw.githubusercontent.com/jenkinsci/kubernetes-operator/master/deploy/all-in-one-v1alpha2.yaml
  4. Apply Persistance Volume Claim:

    kubectl apply -f - <<EOF
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
    name: jenkins-backup
    namespace: default
    spec:
    accessModes:
    - ReadWriteOnce
    resources:
    requests:
      storage: 500Gi
    EOF

5 Apply Jenkins master:

kubectl apply -f - <<EOF
apiVersion: jenkins.io/v1alpha2
kind: Jenkins
metadata:
  name: example
  namespace: default
spec:
  configurationAsCode:
    configurations: []
    secret:
      name: ""
  groovyScripts:
    configurations: []
    secret:
      name: ""
  jenkinsAPISettings:
    authorizationStrategy: createUser
  master:
    disableCSRFProtection: false
    containers:
      - name: jenkins-master
        image: jenkins/jenkins:jdk11
        imagePullPolicy: Always
        livenessProbe:
          failureThreshold: 12
          httpGet:
            path: /login
            port: http
            scheme: HTTP
          initialDelaySeconds: 100
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        readinessProbe:
          failureThreshold: 10
          httpGet:
            path: /login
            port: http
            scheme: HTTP
          initialDelaySeconds: 80
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources:
          limits:
            cpu: 1500m
            memory: 3Gi
          requests:
            cpu: "1"
            memory: 500Mi
      - name: backup # container responsible for the backup and restore
        env:
          - name: BACKUP_DIR
            value: /backup
          - name: JENKINS_HOME
            value: /jenkins-home
          - name: BACKUP_COUNT
            value: "3" # keep only the 2 most recent backups
        image: virtuslab/jenkins-operator-backup-pvc:v0.1.1 # look at backup/pvc directory
        imagePullPolicy: IfNotPresent
        volumeMounts:
          - mountPath: /jenkins-home # Jenkins home volume
            name: jenkins-home
          - mountPath: /backup # backup volume
            name: backup
        resources:
          limits:
            cpu: 1000m
            memory: 3Gi
          requests:
            cpu: "1"
            memory: 500Mi
    volumes:
    - name: backup # PVC volume where backups will be stored
      persistentVolumeClaim:
        claimName: jenkins-backup
  backup:
    containerName: backup # container name is responsible for backup
    action:
      exec:
        command:
          - /home/user/bin/backup.sh # this command is invoked on "backup" container to make backup, for example /home/user/bin/backup.sh <backup_number>, <backup_number> is passed by operator
    interval: 30 # how often make backup in seconds
    makeBackupBeforePodDeletion: true # make a backup before pod deletion
  restore:
    containerName: backup # container name is responsible for restore backup
    action:
      exec:
        command:
          - /home/user/bin/restore.sh # this command is invoked on "backup" container to make restore backup, for example /home/user/bin/restore.sh <backup_number>, <backup_number> is passed by operator
    #recoveryOnce: <backup_number> # if want to restore specific backup configure this field and then Jenkins will be restarted and desired backup will be restored
    getLatestAction:
      exec:
        command:
          - /home/user/bin/get-latest.sh # this command is invoked on "backup" container to get last backup number before pod deletion; not having it in the CR may cause loss of data
  seedJobs:
    - id: jenkins-operator
      targets: "cicd/jobs/*.jenkins"
      description: "Jenkins Operator repository"
      repositoryBranch: master
      repositoryUrl: https://github.com/jczz/jenkins-operator.git
EOF

Note above:

I am not using using seedjobs from example, but I have created a job that last 3 minutes from: https://github.com/jczz/jenkins-operator.git

  1. Wait for pods to be running:

    kubectl get pods -w
  2. Get username and password:

    # echo "jenkins-operator"
    kubectl get secret jenkins-operator-credentials-example -o 'jsonpath={.data.password}' | base64 -d
  3. Forward port:

    pkill kubectl -9
    kubectl port-forward jenkins-example 8080:8080 &
  4. Start jobs: Now go and start som jobs in Jenkins web ui

  5. Update/patch jenkins master:

    kubectl patch jenkins example --type='json' -p='[{"op": "replace", "path": "/spec/master/containers/0/resources/limits/cpu", "value": "1600m"}]'
  6. Wait for the jenkins-example to be up and running.

    kubectl get pods -w
  7. Forward port(again):

    pkill kubectl -9
    kubectl port-forward jenkins-example 8080:8080 &

Error:

logs from the Jenkins UI - Console:

Resuming build at Fri Aug 18 13:11:19 UTC 2023 after Jenkins restart
Waiting for reconnection of k8sagent-e2e-q666n-fwdpt before proceeding with build

See logs from jnlp

kubectl logs k8sagent-e2e-<number> -c jnlp

Additional information

Kubernetes version: Client Version: v1.27.2 Kustomize Version: v5.0.1 Server Version: v1.27.3

Jenkins Operator version: latest