jenkinsci / kubernetes-operator

Kubernetes native Jenkins Operator
https://jenkinsci.github.io/kubernetes-operator
Other
601 stars 236 forks source link

Environment mutation causing pod to restart #586

Open doncorsean opened 3 years ago

doncorsean commented 3 years ago

After CONSTANT struggles to get any version of operator to work we tried using the latest release (0.6.0). Getting an error related to missing volume

Error: cannot find volume "jenkins-home" to mount into container "jenkins-master"

Steps to reproduce the behavior: Follow the documentation here for 0.6.0 release https://jenkinsci.github.io/kubernetes-operator/docs/installation/

Additional information

Kubernetes version: 1.19.6 Jenkins Operator version: 0.6.0

EDIT:

After debugging by @ambrons we discovered our New Relic is injecting some env vars to the jenkins pod causing couple potentially misleading error messages and the pod to restart indefinitely.

thecooldrop commented 3 years ago

It does work. I am using it currently in a local kind cluster. What you are describing is an issue with your Kubernetes setup.

prryb commented 3 years ago

Hello, well it does work, I've just managed to successfully deploy (on minikube 1.21.0 and kubernetes 1.21.0) the operator along with Jenkins using the helm chart from the master branch.

Could you provide some more details:

Jenkins uses emptyDir (https://kubernetes.io/docs/concepts/storage/volumes/#emptydir) for "jenkins-home" – for some reason, it's not being created by your Kubernetes.

ambrons commented 3 years ago

@prryb

We are using EKS (1.19.6) and we do not have issues with EmptyDir for other pods. See example pod with emptydir in the same workspace:

aws-infra git:(feature/aws-jenkins) cat <<EOM | kubectl apply -f -                                                                                                                                   <aws:dev>
pipe heredoc> apiVersion: v1
kind: Pod
metadata:
  name: myvolumes-pod
spec:
  containers:
  - image: alpine
    imagePullPolicy: IfNotPresent
    name: myvolumes-container
    command: [    'sh', '-c', 'echo The Bench Container 1 is Running ; sleep 3600']
    volumeMounts:
    - mountPath: /demo
      name: demo-volume
  volumes:
  - name: demo-volume
    emptyDir: {}
pipe heredoc> EOM
pod/myvolumes-pod created
➜  aws-infra git:(feature/aws-jenkins) k get pod                                                                                                                                                        <aws:dev>
NAME            READY   STATUS    RESTARTS   AGE
myvolumes-pod   1/1     Running   0          5s
➜  aws-infra git:(feature/aws-jenkins) k get pod                                                                                                                                                        <aws:dev>
NAME            READY   STATUS    RESTARTS   AGE
myvolumes-pod   1/1     Running   0          9s
➜  aws-infra git:(feature/aws-jenkins) k exec -it myvolumes-pod -- sh                                                                                                                                   <aws:dev>
/ # ls -la /demo/
total 0
drwxrwxrwx    2 root     root             6 Jun 22 11:37 .
drwxr-xr-x    1 root     root            41 Jun 22 11:37 ..
/ # cd /demo
/demo # ls
/demo # touch worksforme
/demo # ls -la
total 0
drwxrwxrwx    2 root     root            24 Jun 22 11:37 .
drwxr-xr-x    1 root     root            41 Jun 22 11:37 ..
-rw-r--r--    1 root     root             0 Jun 22 11:37 worksforme
/demo #

We are deploying via helm chart using the documentation here for the operator: https://jenkinsci.github.io/kubernetes-operator/docs/installation/#deploy-jenkins-operator-using-helm-chart

Here's the values file for our operator:

# Jenkins Operator Helm chart

# Jenkins instance configuration
jenkins:

  # name of resource
  # The pod name will be jenkins-<name> (name will be set as suffix)
  name: test

  # namespace is the namespace where the resources will be deployed
  # It's not recommended to use default namespace
  # Create new namespace for jenkins (called e.g. jenkins)
  namespace: jenkins

  # basePlugins are plugins installed and required by the operator
  # Shouldn't contain plugins defined by user
  # You can change their versions here
  # See https://jenkinsci.github.io/kubernetes-operator/docs/getting-started/latest/customization/#install-plugins for more details
  #
  # Example:
  #
  basePlugins:
  - name: kubernetes
    version: 1.29.6
  - name: workflow-job
    version: "2.41"
  - name: workflow-aggregator
    version: "2.6"
  - name: git
    version: 4.7.2
  - name: job-dsl
    version: "1.77"
  - name: configuration-as-code
    version: "1.51"
  - name: kubernetes-credentials-provider
    version: 0.18-1

  # volumes used by Jenkins
  # By default, we are only using backup
  volumes:
    - name: backup # PVC volume where backups will be stored
      persistentVolumeClaim:
        claimName: jenkins-backup

  # volumeMounts are mounts for Jenkins pod
  volumeMounts: []

  # backup is section for configuring operator's backup feature
  # By default backup feature is enabled and pre-configured
  # This section simplifies the configuration described here: https://jenkinsci.github.io/kubernetes-operator/docs/getting-started/latest/configure-backup-and-restore/
  # For customization tips see https://jenkinsci.github.io/kubernetes-operator/docs/getting-started/latest/custom-backup-and-restore/
  backup:

    # volumeMounts holds the mount points for volumes
    volumeMounts:
      - name: jenkins-home
        mountPath: /jenkins-home # Jenkins home volume
      - mountPath: /backup # backup volume
        name: backup

# operator is section for configuring operator deployment
operator:

  # image is the name (and tag) of the Jenkins Operator image
  image: virtuslab/jenkins-operator:v0.6.0

This was pulled from here with very minor tweaks to jenkins.name and jenkins.namespace. Values file location: https://github.com/jenkinsci/kubernetes-operator/blob/master/chart/jenkins-operator/values.yaml

Snippet of jenkins-opeartor logs:

2021-06-22T13:12:41.178Z    INFO    controller-jenkins  Env has changed to '[{Name:COPY_REFERENCE_FILE_LOG Value:/var/lib/jenkins/copy_reference_file.log ValueFrom:nil} {Name:JAVA_OPTS Value:-XX:MinRAMPercentage=50.0 -XX:MaxRAMPercentage=80.0 -Djenkins.install.runSetupWizard=false -Djava.awt.headless=true ValueFrom:nil} {Name:JENKINS_HOME Value:/var/lib/jenkins ValueFrom:nil}]' in container 'jenkins-master'  {"cr": "test"}
2021-06-22T13:12:41.178Z    INFO    controller-jenkins  Env has changed to '[{Name:BACKUP_DIR Value:/backup ValueFrom:nil} {Name:JENKINS_HOME Value:/jenkins-home ValueFrom:nil} {Name:BACKUP_COUNT Value:3 ValueFrom:nil}]' in container 'backup'  {"cr": "test"}
2021-06-22T13:12:41.189Z    DEBUG   controller-jenkins  Reconciling Jenkins {"cr": "test"}
2021-06-22T13:12:41.190Z    DEBUG   controller-jenkins  Operator credentials secret is present  {"cr": "test"}
2021-06-22T13:12:41.211Z    DEBUG   controller-jenkins  Scripts config map is present   {"cr": "test"}
2021-06-22T13:12:41.227Z    DEBUG   controller-jenkins  Init configuration config map is present    {"cr": "test"}
2021-06-22T13:12:41.257Z    DEBUG   controller-jenkins  Base configuration config map is present    {"cr": "test"}
2021-06-22T13:12:41.257Z    DEBUG   controller-jenkins  GroovyScripts Secret and ConfigMap added watched labels {"cr": "test"}
2021-06-22T13:12:41.257Z    DEBUG   controller-jenkins  ConfigurationAsCode Secret and ConfigMap added watched labels   {"cr": "test"}
2021-06-22T13:12:41.257Z    DEBUG   controller-jenkins  createServiceAccount with annotations map[] {"cr": "test"}
2021-06-22T13:12:41.445Z    DEBUG   controller-jenkins  Service account, role and role binding are present  {"cr": "test"}
2021-06-22T13:12:41.445Z    DEBUG   controller-jenkins  Extra role bindings are present {"cr": "test"}
2021-06-22T13:12:41.457Z    DEBUG   controller-jenkins  Jenkins HTTP Service is present {"cr": "test"}
2021-06-22T13:12:41.471Z    DEBUG   controller-jenkins  Jenkins slave Service is present    {"cr": "test"}
2021-06-22T13:12:41.471Z    DEBUG   controller-jenkins  Kubernetes resources are present    {"cr": "test"}
2021-06-22T13:12:41.471Z    DEBUG   controller-jenkins  Jenkins master pod is present   {"cr": "test"}
2021-06-22T13:12:41.471Z    DEBUG   controller-jenkins  Jenkins master pod is terminating   {"cr": "test"}
2021-06-22T13:12:41.472Z    DEBUG   controller-jenkins  Reconciling Jenkins {"cr": "test"}
2021-06-22T13:12:41.472Z    DEBUG   controller-jenkins  Operator credentials secret is present  {"cr": "test"}
2021-06-22T13:12:41.495Z    DEBUG   controller-jenkins  Scripts config map is present   {"cr": "test"}
2021-06-22T13:12:41.513Z    DEBUG   controller-jenkins  Init configuration config map is present    {"cr": "test"}
2021-06-22T13:12:41.539Z    DEBUG   controller-jenkins  Base configuration config map is present    {"cr": "test"}
2021-06-22T13:12:41.539Z    DEBUG   controller-jenkins  GroovyScripts Secret and ConfigMap added watched labels {"cr": "test"}
2021-06-22T13:12:41.539Z    DEBUG   controller-jenkins  ConfigurationAsCode Secret and ConfigMap added watched labels   {"cr": "test"}
2021-06-22T13:12:41.539Z    DEBUG   controller-jenkins  createServiceAccount with annotations map[] {"cr": "test"}
2021-06-22T13:12:41.716Z    DEBUG   controller-jenkins  Service account, role and role binding are present  {"cr": "test"}
2021-06-22T13:12:41.717Z    DEBUG   controller-jenkins  Extra role bindings are present {"cr": "test"}
2021-06-22T13:12:41.728Z    DEBUG   controller-jenkins  Jenkins HTTP Service is present {"cr": "test"}
2021-06-22T13:12:41.740Z    DEBUG   controller-jenkins  Jenkins slave Service is present    {"cr": "test"}
2021-06-22T13:12:41.740Z    DEBUG   controller-jenkins  Kubernetes resources are present    {"cr": "test"}
2021-06-22T13:12:41.740Z    DEBUG   controller-jenkins  Jenkins master pod is present   {"cr": "test"}
2021-06-22T13:12:41.740Z    DEBUG   controller-jenkins  Jenkins master pod is terminating   {"cr": "test"}
2021-06-22T13:12:46.472Z    DEBUG   controller-jenkins  Reconciling Jenkins {"cr": "test"}
2021-06-22T13:12:46.472Z    DEBUG   controller-jenkins  Operator credentials secret is present  {"cr": "test"}
2021-06-22T13:12:46.492Z    DEBUG   controller-jenkins  Scripts config map is present   {"cr": "test"}
2021-06-22T13:12:46.511Z    DEBUG   controller-jenkins  Init configuration config map is present    {"cr": "test"}
2021-06-22T13:12:46.540Z    DEBUG   controller-jenkins  Base configuration config map is present    {"cr": "test"}
2021-06-22T13:12:46.540Z    DEBUG   controller-jenkins  GroovyScripts Secret and ConfigMap added watched labels {"cr": "test"}
2021-06-22T13:12:46.540Z    DEBUG   controller-jenkins  ConfigurationAsCode Secret and ConfigMap added watched labels   {"cr": "test"}
2021-06-22T13:12:46.540Z    DEBUG   controller-jenkins  createServiceAccount with annotations map[] {"cr": "test"}
2021-06-22T13:12:46.738Z    DEBUG   controller-jenkins  Service account, role and role binding are present  {"cr": "test"}
2021-06-22T13:12:46.738Z    DEBUG   controller-jenkins  Extra role bindings are present {"cr": "test"}
2021-06-22T13:12:46.750Z    DEBUG   controller-jenkins  Jenkins HTTP Service is present {"cr": "test"}
2021-06-22T13:12:46.761Z    DEBUG   controller-jenkins  Jenkins slave Service is present    {"cr": "test"}
2021-06-22T13:12:46.761Z    DEBUG   controller-jenkins  Kubernetes resources are present    {"cr": "test"}
2021-06-22T13:12:46.761Z    DEBUG   controller-jenkins  Jenkins master pod is present   {"cr": "test"}
2021-06-22T13:12:46.761Z    DEBUG   controller-jenkins  Jenkins master pod is terminating   {"cr": "test"}

kubectl get events --sort-by='{.lastTimestamp}' output:

2m38s       Normal    ScalingReplicaSet        deployment/jenkins-operator              Scaled up replica set jenkins-operator-8666f97b68 to 1
2m38s       Normal    SuccessfulCreate         replicaset/jenkins-operator-8666f97b68   Created pod: jenkins-operator-8666f97b68-7m7tv
2m38s       Normal    Scheduled                pod/jenkins-operator-8666f97b68-7m7tv    Successfully assigned jenkins/jenkins-operator-8666f97b68-7m7tv to ip-10-150-227-194.ec2.internal
2m36s       Normal    Created                  pod/jenkins-operator-8666f97b68-7m7tv    Created container jenkins-operator
2m36s       Normal    Started                  pod/jenkins-operator-8666f97b68-7m7tv    Started container jenkins-operator
2m36s       Normal    Pulled                   pod/jenkins-operator-8666f97b68-7m7tv    Container image "virtuslab/jenkins-operator:v0.6.0" already present on machine
2m29s       Normal    WaitForFirstConsumer     persistentvolumeclaim/jenkins-backup     waiting for first consumer to be created before binding
2m19s       Normal    LeaderElection           lease/c674355f.jenkins.io                jenkins-operator-8666f97b68-7m7tv_e65fedcc-eebd-4360-bf9d-7be1c0dc0df1 became leader
2m19s       Normal    LeaderElection           configmap/c674355f.jenkins.io            jenkins-operator-8666f97b68-7m7tv_e65fedcc-eebd-4360-bf9d-7be1c0dc0df1 became leader
2m18s       Normal                             jenkins/test                             Jenkins master pod restarted by operator:; Jenkins Operator version has changed; Jenkins CR has been replaced; Env has changed; Env has changed
2m18s       Warning   FailedScheduling         pod/jenkins-test                         skip schedule deleting pod: jenkins/jenkins-test
2m17s       Warning   FailedScheduling         pod/jenkins-test                         skip schedule deleting pod: jenkins/jenkins-test
2m17s       Warning   FailedScheduling         pod/jenkins-test                         error while running "VolumeBinding" prebind plugin for pod "jenkins-test": Failed to bind volumes: pod "jenkins/jenkins-test" does not exist any more
2m17s       Warning   FailedScheduling         pod/jenkins-test                         skip schedule deleting pod: jenkins/jenkins-test
2m17s       Warning   FailedScheduling         pod/jenkins-test                         error while running "VolumeBinding" prebind plugin for pod "jenkins-test": Failed to bind volumes: pod "jenkins/jenkins-test" does not exist any more
2m16s       Warning   FailedScheduling         pod/jenkins-test                         skip schedule deleting pod: jenkins/jenkins-test
2m16s       Warning   FailedScheduling         pod/jenkins-test                         error while running "VolumeBinding" prebind plugin for pod "jenkins-test": Failed to bind volumes: pod "jenkins/jenkins-test" does not exist any more
2m15s       Warning   FailedScheduling         pod/jenkins-test                         skip schedule deleting pod: jenkins/jenkins-test
2m15s       Warning   FailedScheduling         pod/jenkins-test                         error while running "VolumeBinding" prebind plugin for pod "jenkins-test": Failed to bind volumes: pod "jenkins/jenkins-test" does not exist any more
2m15s       Warning   FailedScheduling         pod/jenkins-test                         skip schedule deleting pod: jenkins/jenkins-test
2m15s       Warning   FailedScheduling         pod/jenkins-test                         error while running "VolumeBinding" prebind plugin for pod "jenkins-test": Failed to bind volumes: pod "jenkins/jenkins-test" does not exist any more
2m14s       Warning   FailedScheduling         pod/jenkins-test                         error while running "VolumeBinding" prebind plugin for pod "jenkins-test": Failed to bind volumes: pod "jenkins/jenkins-test" does not exist any more
2m14s       Warning   FailedScheduling         pod/jenkins-test                         skip schedule deleting pod: jenkins/jenkins-test
2m14s       Warning   FailedScheduling         pod/jenkins-test                         error while running "VolumeBinding" prebind plugin for pod "jenkins-test": Failed to bind volumes: pod "jenkins/jenkins-test" does not exist any more
2m14s       Warning   FailedScheduling         pod/jenkins-test                         skip schedule deleting pod: jenkins/jenkins-test
2m13s       Normal    ProvisioningSucceeded    persistentvolumeclaim/jenkins-backup     Successfully provisioned volume pvc-02c23f18-0a38-4b05-9423-f6d737e29336 using kubernetes.io/aws-ebs
2m13s       Warning   FailedScheduling         pod/jenkins-test                         error while running "VolumeBinding" prebind plugin for pod "jenkins-test": Failed to bind volumes: pod "jenkins/jenkins-test" does not exist any more
2m13s       Warning   FailedScheduling         pod/jenkins-test                         skip schedule deleting pod: jenkins/jenkins-test
2m12s       Warning   FailedScheduling         pod/jenkins-test                         Binding rejected: plugin "DefaultBinder" failed to bind pod "jenkins/jenkins-test": Operation cannot be fulfilled on pods/binding "jenkins-test": pod jenkins-test is being deleted, cannot be assigned to a host
2m12s       Warning   FailedScheduling         pod/jenkins-test                         error while running "VolumeBinding" prebind plugin for pod "jenkins-test": Failed to bind volumes: pod "jenkins/jenkins-test" does not exist any more
2m12s       Warning   FailedScheduling         pod/jenkins-test                         skip schedule deleting pod: jenkins/jenkins-test
2m12s       Normal    Scheduled                pod/jenkins-test                         Successfully assigned jenkins/jenkins-test to ip-10-150-227-194.ec2.internal
2m10s       Warning   FailedScheduling         pod/jenkins-test                         skip schedule deleting pod: jenkins/jenkins-test
2m9s        Normal    SuccessfulAttachVolume   pod/jenkins-test                         AttachVolume.Attach succeeded for volume "pvc-02c23f18-0a38-4b05-9423-f6d737e29336"
2m          Normal    Scheduled                pod/jenkins-test                         Successfully assigned jenkins/jenkins-test to ip-10-150-227-194.ec2.internal
119s        Normal                             jenkins/test                             Jenkins master pod restarted by operator:; Env has changed; Env has changed
110s        Normal                             jenkins/test                             Creating a new Jenkins Master Pod
110s        Normal    Scheduled                pod/jenkins-test                         Successfully assigned jenkins/jenkins-test to ip-10-150-227-194.ec2.internal
107s        Normal    SuccessfulAttachVolume   pod/jenkins-test                         AttachVolume.Attach succeeded for volume "pvc-02c23f18-0a38-4b05-9423-f6d737e29336"
100s        Normal    Scheduled                pod/jenkins-test                         Successfully assigned jenkins/jenkins-test to ip-10-150-227-194.ec2.internal
90s         Normal    Scheduled                pod/jenkins-test                         Successfully assigned jenkins/jenkins-test to ip-10-150-227-194.ec2.internal
87s         Normal    SuccessfulAttachVolume   pod/jenkins-test                         AttachVolume.Attach succeeded for volume "pvc-02c23f18-0a38-4b05-9423-f6d737e29336"
80s         Normal    Scheduled                pod/jenkins-test                         Successfully assigned jenkins/jenkins-test to ip-10-150-227-194.ec2.internal
70s         Normal    Scheduled                pod/jenkins-test                         Successfully assigned jenkins/jenkins-test to ip-10-150-227-194.ec2.internal
67s         Normal    SuccessfulAttachVolume   pod/jenkins-test                         AttachVolume.Attach succeeded for volume "pvc-02c23f18-0a38-4b05-9423-f6d737e29336"
60s         Normal    Scheduled                pod/jenkins-test                         Successfully assigned jenkins/jenkins-test to ip-10-150-227-194.ec2.internal
50s         Normal    Scheduled                pod/jenkins-test                         Successfully assigned jenkins/jenkins-test to ip-10-150-227-194.ec2.internal
47s         Normal    SuccessfulAttachVolume   pod/jenkins-test                         AttachVolume.Attach succeeded for volume "pvc-02c23f18-0a38-4b05-9423-f6d737e29336"
40s         Normal    Scheduled                pod/jenkins-test                         Successfully assigned jenkins/jenkins-test to ip-10-150-227-194.ec2.internal
30s         Normal    Scheduled                pod/jenkins-test                         Successfully assigned jenkins/jenkins-test to ip-10-150-227-194.ec2.internal
27s         Normal    SuccessfulAttachVolume   pod/jenkins-test                         AttachVolume.Attach succeeded for volume "pvc-02c23f18-0a38-4b05-9423-f6d737e29336"
20s         Normal    Scheduled                pod/jenkins-test                         Successfully assigned jenkins/jenkins-test to ip-10-150-227-194.ec2.internal
10s         Normal    Scheduled                pod/jenkins-test                         Successfully assigned jenkins/jenkins-test to ip-10-150-227-194.ec2.internal
9s          Warning   FailedMount              pod/jenkins-test                         Unable to attach or mount volumes: unmounted volumes=[backup jenkins-home scripts init-configuration operator-credentials jenkins-operator-test-token-hmpck], unattached volumes=[backup jenkins-home scripts init-configuration operator-credentials jenkins-operator-test-token-hmpck]: timed out waiting for the condition
7s          Normal    SuccessfulAttachVolume   pod/jenkins-test                         AttachVolume.Attach succeeded for volume "pvc-02c23f18-0a38-4b05-9423-f6d737e29336"

Here's the kubectl describe pod jenkins-test volume section:

Volumes:
  jenkins-home:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  scripts:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      jenkins-operator-scripts-test
    Optional:  false
  init-configuration:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      jenkins-operator-init-configuration-test
    Optional:  false
  operator-credentials:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  jenkins-operator-credentials-test
    Optional:    false
  backup:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  jenkins-backup
    ReadOnly:   false
  jenkins-operator-test-token-hmpck:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  jenkins-operator-test-token-hmpck
    Optional:    false

All the objects for the mounts are there: pvc: jenkins-backup Bound pvc-02c23f18-0a38-4b05-9423-f6d737e29336 5Gi RWO gp2-encrypted 8m52s

cm: c674355f.jenkins.io 0 39m jenkins-operator-base-configuration-test 8 8m46s jenkins-operator-init-configuration-test 1 8m46s jenkins-operator-scripts-test 2 8m46s

secrets: jenkins-operator-credentials-test Opaque 2 9m3s jenkins-operator-test-token-hmpck kubernetes.io/service-account-token 3 9m3s jenkins-operator-token-j57mr kubernetes.io/service-account-token 3 9m22s sh.helm.release.v1.jenkins-operator.v1 helm.sh/release.v1 1 9m22s

ambrons commented 3 years ago

@prryb We are using helm:

helm upgrade --install jenkins-operator jenkins/jenkins-operator --values=jenkins-operator-test.values.yaml -n jenkins

We're struggling to understand why it can't find the jenkins-home volume considering it's EmptyDir. We tried defining our own

  volumes:
    - name: jenkins-home
      emptyDir: {}

When we do this the operator complains that we can't define our own jenkins-home. So it seems like the operator knows what's going on. Also when we describe the pod we see the jenkins-home volume and it's default mount of /var/lib/jenkins.

As stated and shown above we've not had any other issues with EmptyDir until now. I've noticed in the last few months there's been a handful of people posted relatively the same issue under another title:

I let the operator and pod run for a while and here is consistent snippet of the events log:

50s         Warning   FailedMount              pod/jenkins-test                         Unable to attach or mount volumes: unmounted volumes=[operator-credentials jenkins-operator-test-token-2wbh2 jenkins-home scripts init-configuration], unattached volumes=[operator-credentials jenkins-operator-test-token-2wbh2 jenkins-home scripts init-configuration]: timed out waiting for the condition
46s         Normal    Scheduled                pod/jenkins-test                         Successfully assigned jenkins/jenkins-test to ip-10-150-227-194.ec2.internal
45s         Normal    Pulled                   pod/jenkins-test                         Successfully pulled image "jenkins/jenkins:2.277.4-lts-alpine" in 82.466053ms
45s         Normal    Pulling                  pod/jenkins-test                         Pulling image "jenkins/jenkins:2.277.4-lts-alpine"
45s         Warning   Failed                   pod/jenkins-test                         Error: cannot find volume "jenkins-home" to mount into container "jenkins-master"
43s         Normal    Scheduled                pod/jenkins-test                         Successfully assigned jenkins/jenkins-test to ip-10-150-227-194.ec2.internal
42s         Normal    Pulled                   pod/jenkins-test                         Successfully pulled image "jenkins/jenkins:2.277.4-lts-alpine" in 100.307106ms
42s         Normal    Pulling                  pod/jenkins-test                         Pulling image "jenkins/jenkins:2.277.4-lts-alpine"
42s         Warning   Failed                   pod/jenkins-test                         Error: cannot find volume "jenkins-home" to mount into container "jenkins-master"
40s         Normal    SuccessfulAttachVolume   pod/jenkins-test                         AttachVolume.Attach succeeded for volume "pvc-8fba5618-3d91-434d-a731-3deef54105e7"
36s         Normal    Scheduled                pod/jenkins-test                         Successfully assigned jenkins/jenkins-test to ip-10-150-227-194.ec2.internal
35s         Normal    Pulled                   pod/jenkins-test                         Successfully pulled image "jenkins/jenkins:2.277.4-lts-alpine" in 98.318871ms
35s         Warning   Failed                   pod/jenkins-test                         Error: cannot find volume "jenkins-home" to mount into container "jenkins-master"
35s         Normal    Pulling                  pod/jenkins-test                         Pulling image "jenkins/jenkins:2.277.4-lts-alpine"
33s         Normal    Scheduled                pod/jenkins-test                         Successfully assigned jenkins/jenkins-test to ip-10-150-227-194.ec2.internal
32s         Normal    Pulled                   pod/jenkins-test                         Successfully pulled image "jenkins/jenkins:2.277.4-lts-alpine" in 124.497736ms
32s         Warning   Failed                   pod/jenkins-test                         Error: cannot find volume "jenkins-home" to mount into container "jenkins-master"
32s         Normal    Pulling                  pod/jenkins-test                         Pulling image "jenkins/jenkins:2.277.4-lts-alpine"
26s         Normal    Scheduled                pod/jenkins-test                         Successfully assigned jenkins/jenkins-test to ip-10-150-227-194.ec2.internal
25s         Warning   Failed                   pod/jenkins-test                         Error: cannot find volume "jenkins-home" to mount into container "jenkins-master"
25s         Normal    Pulling                  pod/jenkins-test                         Pulling image "jenkins/jenkins:2.277.4-lts-alpine"
25s         Normal    Pulled                   pod/jenkins-test                         Successfully pulled image "jenkins/jenkins:2.277.4-lts-alpine" in 81.137ms
23s         Normal    Scheduled                pod/jenkins-test                         Successfully assigned jenkins/jenkins-test to ip-10-150-227-194.ec2.internal
23s         Normal    Pulling                  pod/jenkins-test                         Pulling image "jenkins/jenkins:2.277.4-lts-alpine"
23s         Normal    SuccessfulAttachVolume   pod/jenkins-test                         AttachVolume.Attach succeeded for volume "pvc-8fba5618-3d91-434d-a731-3deef54105e7"
22s         Normal    Pulled                   pod/jenkins-test                         Successfully pulled image "jenkins/jenkins:2.277.4-lts-alpine" in 96.922779ms
22s         Warning   Failed                   pod/jenkins-test                         Error: cannot find volume "jenkins-home" to mount into container "jenkins-master"
20s         Warning   FailedMount              pod/jenkins-test                         Unable to attach or mount volumes: unmounted volumes=[jenkins-operator-test-token-2wbh2 jenkins-home scripts init-configuration operator-credentials], unattached volumes=[jenkins-operator-test-token-2wbh2 jenkins-home scripts init-configuration operator-credentials]: timed out waiting for the condition
16s         Normal    Scheduled                pod/jenkins-test                         Successfully assigned jenkins/jenkins-test to ip-10-150-227-194.ec2.internal
13s         Warning   FailedMount              pod/jenkins-test                         Unable to attach or mount volumes: unmounted volumes=[scripts init-configuration operator-credentials jenkins-operator-test-token-2wbh2 jenkins-home], unattached volumes=[scripts init-configuration operator-credentials jenkins-operator-test-token-2wbh2 jenkins-home]: timed out waiting for the condition
6s          Normal    Scheduled                pod/jenkins-test                         Successfully assigned jenkins/jenkins-test to ip-10-150-227-194.ec2.internal
5s          Warning   Failed                   pod/jenkins-test                         Error: cannot find volume "jenkins-home" to mount into container "jenkins-master"
5s          Normal    Pulled                   pod/jenkins-test                         Successfully pulled image "jenkins/jenkins:2.277.4-lts-alpine" in 220.237538ms
5s          Normal    Pulling                  pod/jenkins-test                         Pulling image "jenkins/jenkins:2.277.4-lts-alpine"
3s          Normal    Scheduled                pod/jenkins-test                         Successfully assigned jenkins/jenkins-test to ip-10-150-227-194.ec2.internal
3s          Normal    SuccessfulAttachVolume   pod/jenkins-test                         AttachVolume.Attach succeeded for volume "pvc-8fba5618-3d91-434d-a731-3deef54105e7"
2s          Warning   Failed                   pod/jenkins-test                         Error: cannot find volume "jenkins-home" to mount into container "jenkins-master"
2s          Normal    Pulling                  pod/jenkins-test                         Pulling image "jenkins/jenkins:2.277.4-lts-alpine"
2s          Normal    Pulled                   pod/jenkins-test                         Successfully pulled image "jenkins/jenkins:2.277.4-lts-alpine" in 90.776013ms

I'm also including the yaml for the failing jenkins-test pod:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/psp: eks.privileged
  creationTimestamp: "2021-06-23T14:16:10Z"
  deletionGracePeriodSeconds: 30
  deletionTimestamp: "2021-06-23T14:16:41Z"
  labels:
    app: jenkins-operator
    jenkins-cr: test
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:status:
        f:conditions:
          k:{"type":"ContainersReady"}:
            .: {}
            f:lastProbeTime: {}
            f:lastTransitionTime: {}
            f:message: {}
            f:reason: {}
            f:status: {}
            f:type: {}
          k:{"type":"Initialized"}:
            .: {}
            f:lastProbeTime: {}
            f:lastTransitionTime: {}
            f:status: {}
            f:type: {}
          k:{"type":"Ready"}:
            .: {}
            f:lastProbeTime: {}
            f:lastTransitionTime: {}
            f:message: {}
            f:reason: {}
            f:status: {}
            f:type: {}
        f:containerStatuses: {}
        f:hostIP: {}
        f:startTime: {}
    manager: kubelet
    operation: Update
    time: "2021-06-23T14:16:10Z"
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:labels:
          .: {}
          f:app: {}
          f:jenkins-cr: {}
        f:ownerReferences:
          .: {}
          k:{"uid":"d9bb0f2f-0deb-4577-a0c5-954620390848"}:
            .: {}
            f:apiVersion: {}
            f:blockOwnerDeletion: {}
            f:controller: {}
            f:kind: {}
            f:name: {}
            f:uid: {}
      f:spec:
        f:containers:
          k:{"name":"jenkins-master"}:
            .: {}
            f:command: {}
            f:env:
              .: {}
              k:{"name":"COPY_REFERENCE_FILE_LOG"}:
                .: {}
                f:name: {}
                f:value: {}
              k:{"name":"JAVA_OPTS"}:
                .: {}
                f:name: {}
                f:value: {}
              k:{"name":"JENKINS_HOME"}:
                .: {}
                f:name: {}
                f:value: {}
            f:image: {}
            f:imagePullPolicy: {}
            f:livenessProbe:
              .: {}
              f:failureThreshold: {}
              f:httpGet:
                .: {}
                f:path: {}
                f:port: {}
                f:scheme: {}
              f:initialDelaySeconds: {}
              f:periodSeconds: {}
              f:successThreshold: {}
              f:timeoutSeconds: {}
            f:name: {}
            f:ports:
              .: {}
              k:{"containerPort":8080,"protocol":"TCP"}:
                .: {}
                f:containerPort: {}
                f:name: {}
                f:protocol: {}
              k:{"containerPort":50000,"protocol":"TCP"}:
                .: {}
                f:containerPort: {}
                f:name: {}
                f:protocol: {}
            f:readinessProbe:
              .: {}
              f:failureThreshold: {}
              f:httpGet:
                .: {}
                f:path: {}
                f:port: {}
                f:scheme: {}
              f:initialDelaySeconds: {}
              f:periodSeconds: {}
              f:successThreshold: {}
              f:timeoutSeconds: {}
            f:resources:
              .: {}
              f:limits:
                .: {}
                f:cpu: {}
                f:memory: {}
              f:requests:
                .: {}
                f:cpu: {}
                f:memory: {}
            f:terminationMessagePath: {}
            f:terminationMessagePolicy: {}
            f:volumeMounts:
              .: {}
              k:{"mountPath":"/var/jenkins/init-configuration"}:
                .: {}
                f:mountPath: {}
                f:name: {}
                f:readOnly: {}
              k:{"mountPath":"/var/jenkins/operator-credentials"}:
                .: {}
                f:mountPath: {}
                f:name: {}
                f:readOnly: {}
              k:{"mountPath":"/var/jenkins/scripts"}:
                .: {}
                f:mountPath: {}
                f:name: {}
                f:readOnly: {}
              k:{"mountPath":"/var/lib/jenkins"}:
                .: {}
                f:mountPath: {}
                f:name: {}
        f:dnsPolicy: {}
        f:enableServiceLinks: {}
        f:restartPolicy: {}
        f:schedulerName: {}
        f:securityContext:
          .: {}
          f:fsGroup: {}
          f:runAsUser: {}
        f:serviceAccount: {}
        f:serviceAccountName: {}
        f:terminationGracePeriodSeconds: {}
        f:volumes:
          .: {}
          k:{"name":"backup"}:
            .: {}
            f:name: {}
            f:persistentVolumeClaim:
              .: {}
              f:claimName: {}
          k:{"name":"init-configuration"}:
            .: {}
            f:configMap:
              .: {}
              f:defaultMode: {}
              f:name: {}
            f:name: {}
          k:{"name":"jenkins-home"}:
            .: {}
            f:emptyDir: {}
            f:name: {}
          k:{"name":"operator-credentials"}:
            .: {}
            f:name: {}
            f:secret:
              .: {}
              f:defaultMode: {}
              f:secretName: {}
          k:{"name":"scripts"}:
            .: {}
            f:configMap:
              .: {}
              f:defaultMode: {}
              f:name: {}
            f:name: {}
    manager: manager
    operation: Update
    time: "2021-06-23T14:16:10Z"
  name: jenkins-test
  namespace: jenkins
  ownerReferences:
  - apiVersion: jenkins.io/v1alpha2
    blockOwnerDeletion: true
    controller: true
    kind: Jenkins
    name: test
    uid: d9bb0f2f-0deb-4577-a0c5-954620390848
  resourceVersion: "38396237"
  selfLink: /api/v1/namespaces/jenkins/pods/jenkins-test
  uid: de42f516-c8e6-41ed-af16-f4d8db6472a1
spec:
  containers:
  - command:
    - bash
    - -c
    - /var/jenkins/scripts/init.sh && exec /sbin/tini -s -- /usr/local/bin/jenkins.sh
    env:
    - name: COPY_REFERENCE_FILE_LOG
      value: /var/lib/jenkins/copy_reference_file.log
    - name: JAVA_OPTS
      value: -XX:MinRAMPercentage=50.0 -XX:MaxRAMPercentage=80.0 -Djenkins.install.runSetupWizard=false
        -Djava.awt.headless=true
    - name: JENKINS_HOME
      value: /var/lib/jenkins
    - name: NEW_RELIC_METADATA_KUBERNETES_CLUSTER_NAME
      value: web-devops-cluster
    - name: NEW_RELIC_METADATA_KUBERNETES_NODE_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: spec.nodeName
    - name: NEW_RELIC_METADATA_KUBERNETES_NAMESPACE_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.namespace
    - name: NEW_RELIC_METADATA_KUBERNETES_POD_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.name
    - name: NEW_RELIC_METADATA_KUBERNETES_CONTAINER_NAME
      value: jenkins-master
    - name: NEW_RELIC_METADATA_KUBERNETES_CONTAINER_IMAGE_NAME
      value: jenkins/jenkins:2.277.4-lts-alpine
    image: jenkins/jenkins:2.277.4-lts-alpine
    imagePullPolicy: Always
    livenessProbe:
      failureThreshold: 12
      httpGet:
        path: /login
        port: http
        scheme: HTTP
      initialDelaySeconds: 80
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 5
    name: jenkins-master
    ports:
    - containerPort: 8080
      name: http
      protocol: TCP
    - containerPort: 50000
      name: slavelistener
      protocol: TCP
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /login
        port: http
        scheme: HTTP
      initialDelaySeconds: 30
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    resources:
      limits:
        cpu: 1500m
        memory: 3Gi
      requests:
        cpu: "1"
        memory: 500Mi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/lib/jenkins
      name: jenkins-home
    - mountPath: /var/jenkins/scripts
      name: scripts
      readOnly: true
    - mountPath: /var/jenkins/init-configuration
      name: init-configuration
      readOnly: true
    - mountPath: /var/jenkins/operator-credentials
      name: operator-credentials
      readOnly: true
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: jenkins-operator-test-token-2wbh2
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: ip-10-150-227-194.ec2.internal
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Never
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 1000
    runAsUser: 1000
  serviceAccount: jenkins-operator-test
  serviceAccountName: jenkins-operator-test
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - emptyDir: {}
    name: jenkins-home
  - configMap:
      defaultMode: 511
      name: jenkins-operator-scripts-test
    name: scripts
  - configMap:
      defaultMode: 420
      name: jenkins-operator-init-configuration-test
    name: init-configuration
  - name: operator-credentials
    secret:
      defaultMode: 420
      secretName: jenkins-operator-credentials-test
  - name: backup
    persistentVolumeClaim:
      claimName: jenkins-backup
  - name: jenkins-operator-test-token-2wbh2
    secret:
      defaultMode: 420
      secretName: jenkins-operator-test-token-2wbh2
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2021-06-23T14:16:10Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2021-06-23T14:16:10Z"
    message: 'containers with unready status: [jenkins-master]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2021-06-23T14:16:10Z"
    message: 'containers with unready status: [jenkins-master]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2021-06-23T14:16:10Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - image: jenkins/jenkins:2.277.4-lts-alpine
    imageID: ""
    lastState: {}
    name: jenkins-master
    ready: false
    restartCount: 0
    started: false
    state:
      waiting:
        reason: ContainerCreating
  hostIP: 10.150.227.194
  phase: Pending
  qosClass: Burstable
  startTime: "2021-06-23T14:16:10Z"
ambrons commented 3 years ago

So i was reading around a bit more and digesting what I'm seeing a bit more and honestly I think the issue is more about the operator restarting Jenkins each time the env is set by the operator.

The values it's calling out during the restart is here:

2021-06-23T15:00:43.573Z    INFO    controller-jenkins  Jenkins master pod restarted by operator: Env has changed to '[{Name:COPY_REFERENCE_FILE_LOG Value:/var/lib/jenkins/copy_reference_file.log ValueFrom:nil} {Name:JAVA_OPTS Value:-XX:MinRAMPercentage=50.0 -XX:MaxRAMPercentage=80.0 -Djenkins.install.runSetupWizard=false -Djava.awt.headless=true ValueFrom:nil} {Name:JENKINS_HOME Value:/var/lib/jenkins ValueFrom:nil}]' in container 'jenkins-master'    {"cr": "test"}

This spits out in the operator log every 3 - 5 seconds.

The Jenkins Kind that was created by the operator helm chart is:

apiVersion: v1
items:
- apiVersion: jenkins.io/v1alpha2
  kind: Jenkins
  metadata:
    annotations:
      meta.helm.sh/release-name: jenkins-operator
      meta.helm.sh/release-namespace: jenkins
    creationTimestamp: "2021-06-23T14:55:34Z"
    generation: 4
    labels:
      app.kubernetes.io/managed-by: Helm
    managedFields:
    - apiVersion: jenkins.io/v1alpha2
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:meta.helm.sh/release-name: {}
            f:meta.helm.sh/release-namespace: {}
          f:labels:
            .: {}
            f:app.kubernetes.io/managed-by: {}
        f:spec:
          .: {}
          f:configurationAsCode:
            .: {}
            f:configurations: {}
            f:secret:
              .: {}
              f:name: {}
          f:groovyScripts:
            .: {}
            f:configurations: {}
            f:secret:
              .: {}
              f:name: {}
          f:jenkinsAPISettings:
            .: {}
            f:authorizationStrategy: {}
          f:master:
            .: {}
            f:basePlugins: {}
            f:disableCSRFProtection: {}
            f:securityContext:
              .: {}
              f:fsGroup: {}
              f:runAsUser: {}
            f:volumes: {}
      manager: Go-http-client
      operation: Update
      time: "2021-06-23T14:59:45Z"
    - apiVersion: jenkins.io/v1alpha2
      fieldsType: FieldsV1
      fieldsV1:
        f:spec:
          f:backup:
            .: {}
            f:action: {}
            f:containerName: {}
            f:interval: {}
            f:makeBackupBeforePodDeletion: {}
          f:master:
            f:containers: {}
          f:restore:
            .: {}
            f:action: {}
            f:containerName: {}
            f:getLatestAction: {}
          f:service:
            .: {}
            f:port: {}
            f:type: {}
          f:serviceAccount: {}
          f:slaveService:
            .: {}
            f:port: {}
            f:type: {}
        f:status:
          .: {}
          f:operatorVersion: {}
          f:provisionStartTime: {}
          f:userAndPasswordHash: {}
      manager: manager
      operation: Update
      time: "2021-06-23T14:59:45Z"
    name: test
    namespace: jenkins
    resourceVersion: "38419805"
    selfLink: /apis/jenkins.io/v1alpha2/namespaces/jenkins/jenkins/test
    uid: 727ade45-65ec-481c-b982-a4d0afcb4871
  spec:
    backup:
      action: {}
      containerName: ""
      interval: 0
      makeBackupBeforePodDeletion: false
    configurationAsCode:
      configurations: []
      secret:
        name: ""
    groovyScripts:
      configurations: []
      secret:
        name: ""
    jenkinsAPISettings:
      authorizationStrategy: createUser
    master:
      basePlugins:
      - name: kubernetes
        version: 1.29.6
      - name: workflow-job
        version: "2.41"
      - name: workflow-aggregator
        version: "2.6"
      - name: git
        version: 4.7.2
      - name: job-dsl
        version: "1.77"
      - name: configuration-as-code
        version: "1.51"
      - name: kubernetes-credentials-provider
        version: 0.18-1
      containers:
      - command:
        - bash
        - -c
        - /var/jenkins/scripts/init.sh && exec /sbin/tini -s -- /usr/local/bin/jenkins.sh
        env:
        - name: JAVA_OPTS
          value: -XX:MinRAMPercentage=50.0 -XX:MaxRAMPercentage=80.0 -Djenkins.install.runSetupWizard=false
            -Djava.awt.headless=true
        - name: JENKINS_HOME
          value: /var/lib/jenkins
        image: jenkins/jenkins:2.277.4-lts-alpine
        imagePullPolicy: Always
        livenessProbe:
          failureThreshold: 12
          httpGet:
            path: /login
            port: http
            scheme: HTTP
          initialDelaySeconds: 80
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        name: jenkins-master
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /login
            port: http
            scheme: HTTP
          initialDelaySeconds: 30
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources:
          limits:
            cpu: 1500m
            memory: 3Gi
          requests:
            cpu: "1"
            memory: 500Mi
      disableCSRFProtection: false
      securityContext:
        fsGroup: 1000
        runAsUser: 1000
      volumes:
      - name: backup
        persistentVolumeClaim:
          claimName: jenkins-backup
    restore:
      action: {}
      containerName: ""
      getLatestAction: {}
    service:
      port: 8080
      type: ClusterIP
    serviceAccount: {}
    slaveService:
      port: 50000
      type: ClusterIP
  status:
    operatorVersion: v0.6.0
    provisionStartTime: "2021-06-23T15:04:03Z"
    userAndPasswordHash: yi7MWMy4/o+USojlYgUfDXxsJJC2jeMBhiggCr84yng=
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

So I assume that It's using the Jenkins object to set these values. So the way I see it (making some assumptions here):

ambrons commented 3 years ago

So as @prryb and @thecooldrop stated in the beginning it turned out to be environmental.

So my last message about stuck in an endless loop on Env Change was correct, however what I was wrong about is the message we had in the operator log showed:

2021-06-23T15:00:43.573Z    INFO    controller-jenkins  Jenkins master pod restarted by operator: Env has changed to '[{Name:COPY_REFERENCE_FILE_LOG Value:/var/lib/jenkins/copy_reference_file.log ValueFrom:nil} {Name:JAVA_OPTS Value:-XX:MinRAMPercentage=50.0 -XX:MaxRAMPercentage=80.0 -Djenkins.install.runSetupWizard=false -Djava.awt.headless=true ValueFrom:nil} {Name:JENKINS_HOME Value:/var/lib/jenkins ValueFrom:nil}]' in container 'jenkins-master'    {"cr": "test"}

We have newrelic installed and it injected the following env for every container including jenkins pod managed by the operator.

Adding this to the jenkins.env section is a workaround:

 env:
    - name: NEW_RELIC_METADATA_KUBERNETES_CLUSTER_NAME
      value: "CLUSTER_NAME_GOES_HERE"
    - name: NEW_RELIC_METADATA_KUBERNETES_NODE_NAME
      valueFrom:
        fieldRef:
          apiVersion: "v1"
          fieldPath: spec.nodeName
    - name: NEW_RELIC_METADATA_KUBERNETES_NAMESPACE_NAME
      valueFrom:
        fieldRef:
          apiVersion: "v1"
          fieldPath: metadata.namespace
    - name: NEW_RELIC_METADATA_KUBERNETES_POD_NAME
      valueFrom:
        fieldRef:
          apiVersion: "v1"
          fieldPath: metadata.name
    - name: NEW_RELIC_METADATA_KUBERNETES_CONTAINER_NAME
      value: "master"
    - name: NEW_RELIC_METADATA_KUBERNETES_CONTAINER_IMAGE_NAME
      value: "jenkins/jenkins:2.277.4-lts-alpine"

Is there a better way to handle this? My concern is that newrelic releases a new version that includes a new ENV or changes the format of one of the existing ones such that they don't map and I'm back to square one again.

Is there a way to setup an ignore list for ENV? Or if nothing else the log message didn't include the NEWRELIC* ENVs in the list so it was only after beating my head against the wall I was able to find the root cause.

In short I think we there's an improvement here.

Thanks again for rubber-ducking this for me guys!

doncorsean commented 3 years ago

Thx to @thecooldrop & @prryb for chiming in about it working well for you. This caused us to redouble our efforts and led us to confirm it was an environmental issue with our cluster. We were about ready to throw in the towel as the error about jenkins home volume was misleading. Your comments gave us some renewed hope. I updated the title to reflect the nature of the issue better in case someone else encounters a similar situation.

thecooldrop commented 3 years ago

Thank you @doncorsean, your comment has made my evening and rekindled my hope as well that I will be able to completely integrate this product in my team in professional setup as well.

I had issues with home volume as well and it was quite unclear to me why I can not just mount some PV, but then I dug into older issues and discovered that it is actually a feature to retain immutability.

Also a tip: Upgrade the Kubernetes plugin version to 1.30.0, it seems like the 1.29.6 ( or 1.29.7, whichever is actually latest 1.29.x version ) has a build problem and runs into NoSuchMethodExceptions when spawned Pods try to connect back with master.

ambrons commented 3 years ago

@thecooldrop thanks for the tip!

Both @doncorsean and myself went through the mental shift ourselves. We think with the CasC (Configuration as Code) plugin the reality of an immutable jenkins home is a great idea.

mortenbirkelund commented 3 years ago

I dont understand what the solution to this issue is. If i try to set the following env variable in the values.yaml for helm, then jenkins goes into a restart loop.

    # env contains jenkins container environment variables
    env:
    - name: DD_AGENT_HOST
      valueFrom:
        fieldRef:
          fieldPath: status.hostIP

Because the env changes.

2021-10-05T17:58:35.957Z INFO controller-jenkins Jenkins master pod restarted by operator: Env has changed to '[{Name:COPY_REFERENCE_FILE_LOG Value:/var/lib/jenkins/copy_reference_file.log ValueFrom:nil} {Name:DD_AGENT_HOST Value: ValueFrom:&EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:,FieldPath:status.hostIP,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,}} {Name:JAVA_OPTS Value:-XX:MinRAMPercentage=50.0 -XX:MaxRAMPercentage=80.0 -Djenkins.install.runSetupWizard=false -Djava.awt.headless=true ValueFrom:nil} {Name:JENKINS_HOME Value:/var/lib/jenkins ValueFrom:nil}]' in container 'jenkins-master' {"cr": "master"}
2021-10-05T17:58:35.982Z DEBUG controller-jenkins Reconciling Jenkins {"cr": "master"}

What should I set in order for this to work with jenkins-operator not restarting jenkins?

adaphi commented 2 years ago

Just want to add an answer for the previous comment, in case anyone else winds up here. We had a very similar issue with DataDog env vars. Turns out that this is caused by the automatic addition of the apiVersion field, which causes the operator to see a difference. In our case we went from:

    - name: DATADOG_JENKINS_PLUGIN_TARGET_HOST
      valueFrom:
        fieldRef:
          fieldPath: status.hostIP

to:

    - name: DATADOG_JENKINS_PLUGIN_TARGET_HOST
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: status.hostIP

which fixes it.

mortenbirkelund commented 2 years ago

@adaphi Awesome. Thanks for taking the time to share that workaround.

liujiekm commented 6 months ago

why this issue was closed , same happened in jenkins-operator deployed to EKS , as we want jenkins master having IRSA to integrate with AWS ASG , the aws credential related ENVs and Volumes will dynamically attached to POD which also gonna cause restart loop , any fixing?

brokenpip3 commented 6 months ago

dynamically attached to POD

by what? a mutation webhook or? Issue reopened

liujiekm commented 6 months ago

https://github.com/aws/amazon-eks-pod-identity-webhook/ sort of webhook

DionJones615 commented 5 months ago

I attempted to configure an ephemeral ebs volume under master in the CRD, but this has the same issue because the volume config triggers the ebs-csi-driver to generate the volume in AWS and then update the pod with the volume details for mounting, which triggers jenkins-operator to restart, and round and round it goes.

Looks like it's configured here - https://github.com/jenkinsci/kubernetes-operator/blob/master/pkg/configuration/base/pod.go#L22

@brokenpip3 Would you be amenable to some configuration around disabling these checks individually? Maybe a list of disabled_reconciliations and a contains() check on each conditional?

I scrolled through the issues and I believe these are all related - #361 #368 #733

brokenpip3 commented 5 months ago

Would you be amenable to some configuration around disabling these checks individually? Maybe a list of disabled_reconciliations and a contains() check on each conditional?

yes it's on my plan, as soon I will find the time to complete the 0.9.0 I will introduce several changes. Unfortunately I have limited free time recently so I need to choose where I should put some effort and using the new version of go and sdk-operator is the priority number 1 now for several reasons, sorry about that.

DionJones615 commented 4 months ago

yes it's on my plan, as soon I will find the time to complete the 0.9.0 I will introduce several changes. Unfortunately I have limited free time recently so I need to choose where I should put some effort and using the new version of go and sdk-operator is the priority number 1 now for several reasons, sorry about that.

No worries; I understand your priorities. I can assist here; just want to make sure we agree on the right course. :)