jenkinsci / docker-plugin

Jenkins cloud plugin that uses Docker
https://plugins.jenkins.io/docker-plugin/
MIT License
489 stars 319 forks source link

Docker builds do not work with matrix project plugin #242

Closed samrocketman closed 2 years ago

samrocketman commented 9 years ago

When launching a build for a matrix project the job stays infinitely waiting on slaves and the docker plugin does not spin up any containers. I have published scripts which automatically generate a matrix build job if you wish to test this.

https://github.com/samrocketman/jenkins-bootstrap-jervis

  1. export a GitHub API token.
  2. Provision a jenkins instance with ./jervis_bootstrap.sh
  3. When Jenkins is up press the build botton on "Generate Jenkins job from YAML" and generate the project samrocketman/jervis.
  4. Visit "GitHub Organizations" tab and see the jervis-master job for a working builder.
  5. Configure the docker plugin and update the samrocketman/jervis-master Jenkins job to use a docker slave.

You'll notice that it just stays infinitely in the queue because jobs never appear.

Duplicate issue here: https://issues.jenkins-ci.org/browse/JENKINS-28661

thecere commented 9 years ago

I've had issues with Matrix-Plugin + Labels that could be resolved returning the Label via Groovy-Script. Maybe that helps...

"Groovy script to restrict where this project can be run" return "LABEL_FOR_DOCKER_IMAGE";

samrocketman commented 9 years ago

Where do I configure groovy to return labels? I've not configured labels in that manner before.

KostyaSha commented 9 years ago

Please provide config.xml section of cloud section

samrocketman commented 9 years ago

I'll have to follow up after work.

thecere commented 9 years ago

Sorry, just follow up here, since I have no account at https://issues.jenkins-ci.org.

For the groovy-stuff, you would need another plugin: https://wiki.jenkins-ci.org/display/JENKINS/Groovy+Label+Assignment+plugin

KostyaSha commented 9 years ago

Removed useless comments.

samrocketman commented 9 years ago

Here is the source of the docker container being used: jervis-docker-jvm. You can automatically generate a matrix job from the following automatic Jenkins bootstrapper: jenkins-bootstrap-jervis. After I generated the samrocketman/jervis job I manually configured the job GitHub Organizations/samrocketman/jervis (master branch) to pin the job to the label jervis-docker. If you like, I can attach a JENKINS_HOME with everything configured for testing.

The following is the clouds section of config.xml. The password to connect to the container is jenkins:jenkins.

  <clouds>
    <com.nirima.jenkins.plugins.docker.DockerCloud plugin="docker-plugin@0.9.2">
      <name>docker-local</name>
      <templates>
        <com.nirima.jenkins.plugins.docker.DockerTemplate>
          <image>jervis-docker-jvm:latest</image>
          <dockerCommand>/sbin/my_init</dockerCommand>
          <lxcConfString></lxcConfString>
          <hostname></hostname>
          <dnsHosts/>
          <volumes/>
          <volumesFrom2/>
          <environment/>
          <bindPorts></bindPorts>
          <bindAllPorts>false</bindAllPorts>
          <privileged>false</privileged>
          <tty>false</tty>
          <labelString>jervis-docker</labelString>
          <credentialsId>3fa43af2-5ca3-45ef-b4d4-2f914fef2959</credentialsId>
          <idleTerminationMinutes>5</idleTerminationMinutes>
          <sshLaunchTimeoutMinutes>15</sshLaunchTimeoutMinutes>
          <jvmOptions></jvmOptions>
          <javaPath></javaPath>
          <prefixStartSlaveCmd></prefixStartSlaveCmd>
          <suffixStartSlaveCmd></suffixStartSlaveCmd>
          <remoteFsMapping></remoteFsMapping>
          <remoteFs>/home/jenkins</remoteFs>
          <instanceCap>50</instanceCap>
          <mode>NORMAL</mode>
          <retentionStrategy class="com.nirima.jenkins.plugins.docker.strategy.DockerOnceRetentionStrategy">
            <idleMinutes>0</idleMinutes>
            <idleMinutes defined-in="com.nirima.jenkins.plugins.docker.strategy.DockerOnceRetentionStrategy">0</idleMinutes>
          </retentionStrategy>
          <numExecutors>1</numExecutors>
        </com.nirima.jenkins.plugins.docker.DockerTemplate>
      </templates>
      <serverUrl>http://127.0.0.1:4243</serverUrl>
      <containerCap>50</containerCap>
      <connectTimeout>5</connectTimeout>
      <readTimeout>15</readTimeout>
      <version></version>
      <credentialsId></credentialsId>
    </com.nirima.jenkins.plugins.docker.DockerCloud>
  </clouds>
samrocketman commented 9 years ago

I've attached config files in the duplicate JIRA issue since GitHub issues doesn't do file attachments. Also, I have the following plugins installed (besides the default).

Installed plugins:
Plugin:docker-plugin
Plugin:rich-text-publisher-plugin
Plugin:cloudbees-folder
Plugin:git
Plugin:job-dsl
Plugin:durable-task
Plugin:script-security
Plugin:dashboard-view
Plugin:github
Plugin:view-job-filters
Plugin:groovy
Plugin:console-column-plugin
Plugin:github-api
Plugin:git-client
Plugin:embeddable-build-status
Plugin:github-oauth
samrocketman commented 9 years ago

jenkins-bootstrap-jervis now bootstraps with docker configured. So you can use that for testing. The only thing you need is to have docker installed and jervis-docker-jvm docker image.

KostyaSha commented 9 years ago

Hi, is it still actual?

samrocketman commented 9 years ago

Have you tried using the bootstrapping scripts? They're pretty straight forward and great for testing this issue. You get matrix jobs out of the box with my instructions.

It is likely still actual as it was the case when I did it a few weeks ago. As of right now I'm in the process of moving over 4k miles so I don't have time to test it. I recommend not closing this issue until it can be validated.

thomassuckow commented 9 years ago

If the groovy trick did work, I would suspect it is https://issues.jenkins-ci.org/browse/JENKINS-27034

masakura commented 9 years ago

I have the same problem.

I tried to use groovy script label, but failed.

masakura commented 9 years ago

And, Groovy script to restrict where this project can be run set like this.

return currentJob.getClass().toString() == "class hudson.matrix.MatrixProject" ? "master" : "docker";

Though it may not be a good way, it worked.

samrocketman commented 9 years ago

I'd rather it behave a similar way as non-matrix jobs. Set it and forget it. Additionally, that groovy expression has the build occurring on the master node instead of the docker container. It is neither a workaround nor a fix.

samrocketman commented 9 years ago

Seems this is an issue in Jenkins core jenkinsci/jenkins#1815. Once that gets merged this should be sorted out. I recommend this issue remain open until that gets merge and the change is verified.

Thanks @jglick

samrocketman commented 9 years ago

@KostyaSha I've tested this since jenkinsci/jenkins#1815 was merged. It seems the latest Jenkins 1.635 includes the patch. This is still an issue because the matrix build doesn't utilize the slave created by the docker plugin. It would be nice if it would create one slave per flyweight task. (i.e. there are 8 builds in the matrix so it should create 8 docker containers if capacity allows it).

Test environment includes:

Host information:
Ubuntu 14.04.3 LTS 
Linux 3.13.0-66-generic x86_64
Docker version 1.7.1, build 786b29d
Jenkins ver. 1.635
Docker plugin ver. 0.15.0

Docker slave: https://github.com/samrocketman/docker-jenkins-jervis

Configuration steps:

  1. Edit /etc/default/docker and add the following environment variable:

    export DOCKER_OPTS="--host=unix:///var/run/docker.sock --host=tcp://127.0.0.1:4243"
  2. Execute restart docker to restart the docker daemon.
  3. All following steps are for configuration done on the Jenkins host.
  4. Add docker cloud from configuration.
    • Name: localhost docker
    • Docker URL: http://127.0.0.1:4243
    • Container Cap: 100
    • All other settings for cloud left default.
  5. Add docker template to localhost docker cloud with the following settings (expand Container settings).

    • Docker image: jervis-docker-jvm
    • Docker command: /sbin/my_init
    • Volumes: (volumes not required; I'm just caching for builds)

      /home/sam/.cache/docker/maven:/home/jenkins/.m2
      /home/sam/.cache/docker/gradle:/home/jenkins/.gradle
    • Instance Capacity: 20
    • Remote File System Root: /home/jenkins
    • Labels: stable docker ubuntu1404 sudo env jdk language:groovy
    • Credentials: username:password credentials set to jenkins:jenkins
    • All other settings for cloud template left default.

Here's a screenshot of the described behavior. The slave is started and ready but the matrix job isn't making use of it (even though it claims to be waiting on it).

snapshot4

samrocketman commented 9 years ago

Possibly related. https://issues.jenkins-ci.org/browse/JENKINS-22494

jpfeuffer commented 8 years ago

I am currently experiencing exactly the same issue. The images are started as slaves, have all the correct labels and an Idle executor but are not used by the pending matrix-configuration job that led to the launch of this image. Was there any workaround found in the meanwhile?

samrocketman commented 8 years ago

Unfortunately, no work around. I just tested the latest version of Jenkins (1.647) and docker-plugin (0.16.0). Issue continues to be a problem for me.

KostyaSha commented 8 years ago

@samrocketman please provide config.xml for this matrix job.

KostyaSha commented 8 years ago

Ok, debugged. https://github.com/jenkinsci/docker-plugin/issues/242#issuecomment-135603387 is the best solution :atm: It's not a docker-plugin issue, it's jenkins and matrix-project design.

samrocketman commented 8 years ago

https://github.com/jenkinsci/docker-plugin/issues/242#issuecomment-135603387 is not the best solution. You're telling me running builds on the master is the accepted solution? That's a known bad practice because running builds on the master means that master configuration can be changed by the job.

This should remain open because it continues to be an issue. @KostyaSha please reopen this issue.

KostyaSha commented 8 years ago

You are wrong, reopened.

samrocketman commented 8 years ago

Why am I wrong? If the cause of the issue is, "jenkins and matrix-project design," then it should be fixed even if the design needs to be changed. The behavior diverges from, say, freestyle projects which build with docker cloud plugin just fine.

KostyaSha commented 8 years ago

MatrixProject build should happen on master and not on docker slaves.

samrocketman commented 8 years ago

I disagree. Running any job on the master is risky and should be avoided as a best practice. Preferably any master should have executors set to 0. Otherwise, you could have a build job helpfully disable the security on your master by modifying the useSecurity field in config.xml on the master. I work with Jenkins where the jobs are not necessarily trusted.

In any case, I've found the https://wiki.jenkins-ci.org/display/JENKINS/GroovyAxis. I'm going to try it out and report my results back here.

samrocketman commented 8 years ago

I tested https://wiki.jenkins-ci.org/display/JENKINS/GroovyAxis and it is not a solution. It simply generates axis values using a groovy script. I'm looking for something along the lines of providing each matrix run with a label which the docker plugin would use to build the flyweight tasks on a docker provisioned slave.

KostyaSha commented 8 years ago

GroovyAxis unrelated.

samrocketman commented 8 years ago

Right, that's what I commented.

Related issue https://issues.jenkins-ci.org/browse/JENKINS-22494. Seems flyweight tasks don't inherit the parent slave label restrictions.

KostyaSha commented 8 years ago

@samrocketman it my last attempt to help, see https://github.com/KostyaSha/yet-another-docker-plugin/wiki/Usage#matrix-project

samrocketman commented 8 years ago

I see, your wiki document helped me to understand. So the matrix parent job is the flyweight task and the axis build are normal tasks (and take up normal executors).

Why can't flyweight tasks be run on cloud nodes and why do flyweight tasks specifically have to run on the master (or a permanent node)?

KostyaSha commented 8 years ago

Flyweight task itself is the control logic, see matrix job parent log. As soon as you kill slave with it - it will die (according to my local experiments).

samrocketman commented 8 years ago

Thanks for your patience. How do I look at the matrix job parent log?

KostyaSha commented 8 years ago

Console output like in standard builds. Strange that it shows workspace allocation. AFAIK flyweights not using WS...

thomassuckow commented 8 years ago

Flyweights do get a workspace because the flyweight checks out the repo (for some reason). They get allocated to nodes based on the "Restrict where this project can run" option rather than the axes.

Docker-Plugin used to allow flyweights (this was more of an oversight than a feature) but it resulted in badness. That isn't to say it couldn't be resolved if people really want it.

How this went down last time https://github.com/jenkinsci/docker-plugin/issues/148: 1) The slave terminates killing the flyweight because it was unaware the flyweight was there 2) The plugin was modified not to kill the slave if it had a flyweight. 3) Flyweightocolypse, Flyweights would get distributed between docker slaves and prevent them from shutting down. Deadlock ensued on small clusters as none of the needed slaves were running. Defining a good rule for how to handle this is difficult because the flyweight could be started before the end of the job and never complete. 4) We disabled flyweights from using docker containers.

KostyaSha commented 8 years ago

@thomassuckow if flyweight uses workspace, then it should consume executor and the whole core design is broken then.

thomassuckow commented 8 years ago

Core is a rats nest.

KostyaSha commented 8 years ago

@thomassuckow do you know whether workspace is fetched to child jobs from flyweight? I may try hack slave to work with single flyweight.

thomassuckow commented 8 years ago

Workspace is checked out / cloned separately on each normal build. I don't know why the flyweight checks it out at all.

KostyaSha commented 8 years ago

@thomassuckow the initial idea, as i understand, was to get the exact copy to all child builds.

samrocketman commented 8 years ago

@thomassuckow @KostyaSha I tested adding a docker container as a "permanent node". I'm running a container with a lightweight init process with 4 executors defined.

It ran the flyweight task on the "permanent" docker container (i.e. it wasn't treated as a container). It then proceeded to build all child jobs in provisioned containers. A few thoughts come out of this.

It would be nice to make use of the "Experimental" docker features for the flyweight task. However, the Idle Timeout setting is missing for two of the experimental launch container features so there continues to be the bug of docker plugin starting a container but not waiting long enough for it to be initialized. This workaround could look like this:

For the flyweight task, it provisions a more permanent container in which it could run (let's say the container stays on until there's an idle time of 20 minutes). For the child jobs, it simply runs normally. This could potentially be a workaround in the existing core mess.

However, a few changes will need to be made:

ryd994 commented 7 years ago

I recently run into the same issue. It has been a year after. Is there any updates?

samrocketman commented 7 years ago

The situation is the same now as it was then. No updates in this area. To resolve it, follow https://github.com/jenkinsci/docker-plugin/issues/242#issuecomment-184338774

TorstenKruse commented 3 years ago

4 years later, for me it works now, I have a matrix project which is built via docker plugin node.

samrocketman commented 2 years ago

This issue was OK to close. Following up from earlier comments, I would recommend using Jenkins pipeline to matrix build.