Dynamic provision of Docker Agent Templates

galindro commented 11 months ago

What feature do you want to see added?

Would be great if this plugin could do something similar as what Kubernetes plugin does: it provides a high level pipeline step called podTemplate, which gives us the possibility to define the POD to be created dynamically via the pipeline.

This gives a lot of flexibility because we can define during pipeline execution which pod template do we want to use and also we can change their values before inform them to the podTemplate step.

My use case is: I have to spin up some docker containers which are tagged with our application version. Its cumbersome and massive to us to create a Docker Agent Template via JCasC everytime time a new Docker image is published.

We unfortunately can't make use of docker-workflow-plugin because we run our containers in Windows (without WSL2) and such plugin hardcodes cmd.exe when the base image is a windows. Our pipeline needs bash.exe, so it can run in Linux and Windows without any issues.

Upstream changes

No response

Are you interested in contributing this feature?

No response

pjdarton commented 11 months ago

Probably better to raise a PR on the docker workflow plugin ... although, fyi, that particular "run cat or cmd.exe" code is just to run a command that'll sit and do nothing (keeping the container alive) while all the real work is done by other docker commands running things on the container. You can still use bash in those.

Basically, this plugin is for building builds that don't know about docker, using docker to provide build agents for any builds. The docker workflow plugin is for pipeline (aka workflow) builds that know they're using docker, that have opinions on how docker is used. If you're using pipelines and your build knows it's doing docker stuff then you probably should be making the docker workflow plugin fit what you need rather than asking this one to replicate that one.

CJCombrink commented 11 months ago

I do agree with @pjdarton but at some point also wondered about such a feature.

I think the strongest point of this plugin is as commented "building builds that don't know about docker"

For me it is also tedious to have to add templates each time for new base images. We tag a new image after each release, which happens probably at most 4 times a year, which creates 3 or more tags. Now and again we add new images.

I was wondering if there is a middle ground between CasC (we are not there yet) and some 'auto discovery' or templatization of templates. Where one can set a template or patter that then translate to an image to pull, using the same settings as those in the template. Just a crazy idea, not yet sure how feasible it would be, or what the options are.

Crazy idea: In addition to 'labels' have a 'label-patterns' field where you specify for example my-agent-(.*) Then a 'Docker Image Pattern' field that does some-docker-image:$1 and finally in the pipeline agent { label 'my-agent-12' } Which causes the plugin to spin up a container from the image some-docker-image:12

I am not sure if this is supported by Jenkins or the plugin framework, because they will need to be able to support it before even considering something like that

galindro commented 11 months ago

@pjdarton, First of all ty very much for your fast response. This isn't normally what happens in Jenkins community in average...

I was expecting that you would come with such answer. Unfortunately, even though that particular "run cat or cmd.exe" code is just to run a command that'll sit and do nothing, if you try to do the bellow pipeline in a Windows Container, it will fail. It only works with powershell/bat/cmd steps.

def windows_server = docker.image('my-image:version')
windows_server.inside("") {
    script {
        sh("echo Hi!")
    }
}

Of course that maybe something like this could work, but that's exactly what we don't wanna do because we would then face other issues related to escaping characters, etc... Would be a nightmare.

def windows_server = docker.image('my-image:version')
windows_server.inside() {
    script {
        powershell("bash.exe -c 'echo Hi!'")
    }
}

That's why we decided to use this plugin instead of docker-workflow-plugin, because it respects the ENTRYPOINT of the image. Therefore, something like this works flawlessly:

node('my-image-version') { // this is how we name the agent templates, to mimic the image names and tags
    sh("echo Hi!")
}

with this simple JCasC config:

  - docker:
      dockerApi:
        connectTimeout: 60
        dockerHost:
          uri: "tcp://my-server:2375"
        readTimeout: 60
      exposeDockerHost: true
      name: "windows-docker"
      templates:
      - connector:
          jnlp:
            jnlpLauncher:
              webSocket: true
              workDirSettings:
                disabled: false
                failIfWorkDirIsMissing: false
                internalDir: "remoting"
        dockerTemplateBase:
          cpuPeriod: 0
          cpuQuota: 0
          environment:
          - "JENKINS_WEB_SOCKET=true"
          environmentsString: "JENKINS_WEB_SOCKET=true"
          image: "my-image:version"
        labelString: "my-image-version"
        mode: EXCLUSIVE
        name: "my-image-version"
        pullStrategy: PULL_NEVER
        pullTimeout: 300

I like the idea given by @CJCombrink. If it is possible to have it, would be great.

@CJCombrink : my reallity is even worse than yours: we have new releases in a daily basis. So, we would have to automate the JCasC changes, commit them, etc... A massive process.

pjdarton commented 11 months ago

Back when I was using (and maintaining) this plugin, I also had a not-well-supported usecase - I had a set of 10 nearly-identical docker hosts, each with several docker agent templates, and needed to configure them nearly-identically ... and then replicate that across multiple Jenkins controllers. It was too much data to do manually with any reliability. So did I did was write some groovy code which I ran from the Jenkins admin console; code which ensured each docker cloud was defined nearly-identically, and that each template was defined etc. Any time I needed to update a template to point to a new image, I'd edit my groovy code and spam that into every Jenkins controller, which would then update the configuration.

These days, knowing what I know now, plus with the advances in Jenkins etc (I did that before jcasc existed and when pipelines were new), if I had to do that again I think I'd write that as a pipeline, supported by a Jenkins library that did the "needs admin" bits.

I suspect that your regular release process could be automated using a similar strategy. i.e. not "switch everything to use Jcasc" but still "configuration as code".

galindro commented 11 months ago

Yes, true. I could build up such process to update Jenkins configuration seamlessly you did. However, this is one more piece that should be added on top of the process. One more thing to maintain. One more thing that can break.

Also, we can't simply add such configs without relying on JCasC and check-in those changes in a git repository because our Jenkins controler runs in Kubernetes. Therefore, it is totally ephemeral. Each time it's POD gets re-scheduled or it dies for any random reason, it is configured from scratch via JCasC (tks to the official Jenkins helm chart).

So, its not so simple... if it was, be sure that I wouldn't open this thread.

CJCombrink commented 11 months ago

PS: Totally unrelated to the question @galindro maybe look at the following SO answer: https://stackoverflow.com/a/43514090/991000 We are fortunate enough that our images have git installed, thus we have sh and can use it on Windows as well. Also see this ticket on the Jenkins JIRA: https://issues.jenkins.io/browse/JENKINS-33708 (You would see I commented in 2018 already) We don't use it as per your example, but I don't understand why it should not be the case if you configure your image correctly.

galindro commented 11 months ago

Hi @CJCombrink, tks for the tip, but it still doesn't works in my case. I have git installed into the image so it can provide a bash.exe and sh.exe. Both are correctly set in the PATH. Also, this isn't the issue that I get:

java.io.IOException: Cannot run program "nohup" (in directory "C:\Program Files (x86)\Jenkins\workspace\fathertime"): CreateProcess error=2, The system cannot find the file specified

The image that we use is based on windowsservercore-ltsc2019. We just install our application and PortableGit.zip (which provides bash and sh) and we ofc modify the PATH to include them.

However, this doesn't works:

def windows_server = docker.image('my-image:version')
windows_server.inside() {
    script {
        sh("echo Hi!")
    }
}

The error is the bellow one (from docker daemon). Sounds like when the docker command is cmd.exe and the step is sh(), Jenkins tries to execute sh or bash and it can't find it though they are corretly set in the container's PATH.

exec's CreateProcess() failed [container=032288fe9e7913a6df49b371022cc54ce350e9aa25624e04799870bf39d93218 exec=6d028d0b3df53297455ced0885e9f7fa53bbb4ce542600d8669585bcbcb4540b module=libcontainerd error=container 032288fe9e7913a6df49b371022cc54ce350e9aa25624e04799870bf39d93218 encountered an error during hcs::System::CreateProcess: failure in a Windows system call: The system cannot find the file specified. (0x2)

We tried a lot of approaches: change the ENTRYPOINT with bash.exe / sh.exe, but it still doesn't works. We also tried to set the default shell location in Jenkins general config... As I said, this is all because that crap hardcoded cmd.exe.

Ah! There is another issue that I forgot to comment -> https://issues.jenkins.io/browse/JENKINS-30600. Its not possible to run checkout() inside it 🤦‍♂️. But with docker-plugin, it works perfectly. So, I really refuse to go back to docker-workflow-plugin. It unfortunately proved that doesn't works for our use case.

Tks for your atempt to help me. Very kind 🙂

CJCombrink commented 11 months ago

Back to the topic: A quick search turned up public boolean canProvision(Label label) I have no experience with clouds, but would my crazy idea not be implementable from this function? I wonder if the label gets there as-is from the Controller when specifying agent { label 'my-agent-12' }

I am assuming the following controller flow:

Get label from pipeline
Iterate all agents + clouds and ask canProvision()
Do some load balancing
call provision() on the cloud that said yes

I would really like to dive into some tests but unfortunately does not have any capacity for more random work

Peter-Darton-i2 commented 11 months ago

As an ex-maintainer of this plugin, I can say that your assumptions look correct.

Builds ask for agents that match a label specification - sometimes one label, sometimes a set of them.
Jenkins goes and asks the clouds "can you do this?" ... and then asks one (or more) of them "Please make me something to satisfy this label specification".
There's a bit of logic to handle "what've we already set in motion that isn't fully running yet" (otherwise Jenkins can ask for more than is needed, especially if agents are slow to start) but, other than that ...
the cloud plugin does its magic to make a new agent, adds it to Jenkins,
and then Jenkins stops asking for more agents.

In the case of the docker-plugin, the docker templates each have a label specification, and that's how the docker-plugin decides "would one of these satisfy what Jenkins is asking for". In the case of the docker-workflow-plugin, it does not provide agents, rather it runs stuff inside agents that "something else" provides.

e.g. we had the AWS EC2 plugin spin up Linux VMs on demand and then had the docker-workflow-plugin pull our own custom-made docker images onto those and then run our builds inside our containers that were running on the EC2 VM. While that sounds overly complicated, it did mean that we didn't have any long-lived VMs whose OSs we needed to maintain/patch etc.

My guess is that this "nohup" thing is hard-coded in the docker-workflow plugin, (or maybe a different plugin - one that plugin uses to connect), and that the Windows containers don't have "nohup" on the $PATH. If you can get a Java stacktrace pointing at the nohup error then that'd help ... but it'd still be a docker-workflow-plugin issue not a docker-plugin issue. FYI there are other docker plugins for Jenkins too, e.g. "yet another docker" plugin.

Ultimately you have a choice - either find something that works, or find something that looks like it ought to work and then fix it until it does ... but beware that once you go down that slippery path, it's hard to stop. 😉

galindro commented 10 months ago

Hi @Peter-Darton-i2 ,

Regarding the "yet another docker" plugin: its last realase was from 4 years ago.. Sounds an unmaintained plugin, which turns it a quite dangerous path to move on: https://plugins.jenkins.io/yet-another-docker-plugin/

@pjdarton : as far as I understood, you aren't anymore a maintaner of this project right?

I would really love to know if some active developer could handle this subject.

krisstern commented 10 months ago

Hi @galindro,

My apologies for the late respoen, I believe I am one of the remaining active maintainers of this plugin. We could look into implementing your suggested feature. I will need to spend some time to do some background research first though before getting back to you.

galindro commented 9 months ago

Hey @krisstern, did you have any time to look into this request?

krisstern commented 5 months ago

My apologies @galindro... I had been preoccupied with other tasks, let me return to this next week, but it may take between a few weeks to a few months before I can come up with anything.

galindro commented 5 months ago

Don't be sorry man. I totally understand you. Take your time!

galindro commented 1 week ago

Hey @krisstern. Did you have time to look into this?

krisstern commented 1 week ago

Hey @galindro we are following this up internally amongst the maintainers and will update here once we have more info

krisstern commented 1 week ago

Hi @galindro we are thinking about turning this task into a GSoC project. Would that be okay timeline-wise coz it will only happen if Jenkins is selected as a mentoring org and the program only runs during the summer from May to around September. Also, if we are to go ahead with the project as a GSoC project, would you be interested in co-mentoring it? It normally does not take more than 5 hours a week unless you are a lead mentor.

jenkinsci / docker-plugin