jenkinsci / docker

Docker official jenkins repo
https://hub.docker.com/r/jenkins/jenkins
MIT License
6.64k stars 4.52k forks source link

Jenkins spins up an unnecessarily large number of Docker cloud instances for agent label expressions #1274

Closed tobiasherzke closed 2 years ago

tobiasherzke commented 2 years ago

Jenkins and plugins versions report

Environment ```text Jenkins: 2.319.2 OS: Linux - 5.4.0-94-lowlatency --- ace-editor:1.1 ant:1.13 antisamy-markup-formatter:2.7 apache-httpcomponents-client-4-api:4.5.13-1.0 authentication-tokens:1.4 bootstrap4-api:4.6.0-3 bootstrap5-api:5.1.3-4 bouncycastle-api:2.25 branch-api:2.7.0 build-timeout:1.20 caffeine-api:2.9.2-29.v717aac953ff3 checks-api:1.7.2 cloudbees-folder:6.17 command-launcher:1.6 credentials:1074.v60e6c29b_b_44b_ credentials-binding:1.27.1 display-url-api:2.3.5 docker-commons:1.18 docker-java-api:3.1.5.2 docker-plugin:1.2.6 durable-task:493.v195aefbb0ff2 echarts-api:5.2.2-2 email-ext:2.86 font-awesome-api:5.15.4-5 git:4.10.3 git-client:3.11.0 git-server:1.10 github:1.34.1 github-api:1.301-378.v9807bd746da5 github-branch-source:2.11.4 gradle:1.38 handlebars:3.0.8 jackson2-api:2.13.1-246.va8a9f3eaf46a javax-activation-api:1.2.0-2 javax-mail-api:1.6.2-5 jaxb:2.3.0.1 jdk-tool:1.5 jjwt-api:0.11.2-9.c8b45b8bb173 jquery3-api:3.6.0-2 jsch:0.1.55.2 junit:1.53 ldap:2.7 lockable-resources:2.13 mailer:408.vd726a_1130320 matrix-auth:3.0 matrix-project:1.20 momentjs:1.1.1 okhttp-api:4.9.3-105.vb96869f8ac3a pam-auth:1.6.1 pipeline-build-step:2.15 pipeline-github-lib:1.0 pipeline-graph-analysis:188.v3a01e7973f2c pipeline-input-step:427.va6441fa17010 pipeline-milestone-step:1.3.2 pipeline-model-api:1.9.3 pipeline-model-definition:1.9.3 pipeline-model-extensions:1.9.3 pipeline-rest-api:2.20 pipeline-stage-step:291.vf0a8a7aeeb50 pipeline-stage-tags-metadata:1.9.3 pipeline-stage-view:2.20 plain-credentials:1.7 plugin-util-api:2.12.0 popper-api:1.16.1-2 popper2-api:2.11.2-1 resource-disposer:0.17 scm-api:595.vd5a_df5eb_0e39 script-security:1131.v8b_b_5eda_c328e snakeyaml-api:1.29.1 ssh-credentials:1.19 ssh-slaves:1.33.0 sshd:3.1.0 structs:308.v852b473a2b8c timestamper:1.16 token-macro:267.vcdaea6462991 trilead-api:1.0.13 workflow-aggregator:2.6 workflow-api:1122.v7a_916f363c86 workflow-basic-steps:2.24 workflow-cps:2648.va9433432b33c workflow-cps-global-lib:552.vd9cc05b8a2e1 workflow-durable-task-step:1121.va_65b_d2701486 workflow-job:1145.v7f2433caa07f workflow-multibranch:706.vd43c65dec013 workflow-scm-step:2.13 workflow-step-api:622.vb_8e7c15b_c95a_ workflow-support:813.vb_d7c3d2984a_0 ws-cleanup:0.40 ```

What Operating System are you using (both controller, and any agents involved in the problem)?

Ubuntu 20.04

Reproduction steps

Steps 1-5 create a clean and current Jenkins test instance. Steps 6-8 create a Docker agent template. Steps 9 creates a test job, and step 10 executes it.

  1. Have a Linux computer with Docker. Ubuntu 20.04 with package docker.io installed will do.
  2. Spin up a temporary Jenkins test instance with the help of Docker, give it access to the host's Docker socket: sudo docker run -p 8080:8080 --rm -v /var/run/docker.sock:/var/run/docker.sock --user 1000:$(stat -c%g /var/run/docker.sock) jenkins/jenkins:lts-jdk11 This will print the initial admin password to the terminal where it executes.
  3. Take ownership of the new Jenkins instance by pointing a web browser to http://localhost:8080/. Install recommended plugins, skip creating a user.
  4. To configure a Docker cloud and a Docker agent template, open http://localhost:8080/configureClouds.
  5. This will first delegate to the plugin installation page. Install the docker plugin, activate without restart.
  6. Go back to http://localhost:8080/configureClouds
  7. Create a new Docker cloud: Set socket to unix:/var/run/docker.sock, tick the Enabled checkbox, set Container Cap to 10.
  8. Create a Docker Agent template within this cloud. Any image with a suitable JRE should do. For this test, I choose the image openjdk:latest. Give it multiple labels. I use this list of labels for this test: java openjdk x86_64 ubuntu. Save the cloud configuration.
  9. Create a test job. Choose "Pipeline" job. In the Pipeline Script field, insert the "Hello World" sample pipeline from the drop-down menu. Then change the line agent any into agent {label 'java && openjdk && ubuntu'}. Save the new job.
  10. Execute the test job by clicking "Build Now", then quickly switch to the dashboard of Jenkins in order to watch the behaviour of the build executors.

Expected Results

Jenkins should spin up a single container and execute the test job within this container.

Actual Results

Jenkins spins up 10 Containers from the same image in quick succession and keeps them idle while the test build remains in the build queue. Mouse-over on the line in the build queue shows the tool tip text "All nodes of label java && openjdk && ubuntu are offline".

10 Containers is the Container Cap for this cloud.

Only when 10 Containers have been allocated, the job will execute on one of them. That container exits quickly after the job is done. All other containers stay there and stay idle for some time.

Anything else?

I believe this is the result of a recent change, container allocation behaved as expected until recently.

I reproduced this in a temporary Docker-based Jenkins instance in order to check if this is reproducible outside my Jenkins instance. I hope this does not make it too complicated to follow.

timja commented 2 years ago

This repo is only for the docker images, not the docker plugins

timja commented 2 years ago

Possibly related https://issues.jenkins.io/browse/JENKINS-67635

tobiasherzke commented 2 years ago

Sorry, and thanks for the pointer.