jenkinsci / docker-plugin

Jenkins cloud plugin that uses Docker
https://plugins.jenkins.io/docker-plugin/
MIT License
490 stars 320 forks source link

Issues when using SSH connection method against IPv6-enabled agents #839

Closed jay7x closed 1 year ago

jay7x commented 3 years ago

Version report

Jenkins and plugins versions report:

Jenkins: 2.263.3 OS: Linux - 4.15.0-1113-azure

Plugins: ``` ace-editor:1.1 ansicolor:0.7.5 ant:1.11 antisamy-markup-formatter:2.1 apache-httpcomponents-client-4-api:4.5.13-1.0 authentication-tokens:1.4 azure-ad:1.2.1 azure-commons:1.0.5 basic-branch-build-strategies:1.3.2 bitbucket-pullrequest-builder:1.5.0 block-queued-job:0.2.0 blueocean-autofavorite:1.2.4 blueocean-bitbucket-pipeline:1.24.4 blueocean-commons:1.24.4 blueocean-config:1.24.4 blueocean-core-js:1.24.4 blueocean-dashboard:1.24.4 blueocean-display-url:2.4.1 blueocean-events:1.24.4 blueocean-git-pipeline:1.24.4 blueocean-github-pipeline:1.24.4 blueocean-i18n:1.24.4 blueocean-jira:1.24.4 blueocean-jwt:1.24.4 blueocean-personalization:1.24.4 blueocean-pipeline-api-impl:1.24.4 blueocean-pipeline-editor:1.24.4 blueocean-pipeline-scm-api:1.24.4 blueocean-rest-impl:1.24.4 blueocean-rest:1.24.4 blueocean-web:1.24.4 blueocean:1.24.4 bootstrap4-api:4.6.0-1 bouncycastle-api:2.20 branch-api:2.6.2 build-timeout:1.20 caffeine-api:2.9.1-23.v51c4e2c879c8 cctray-xml:1.0 checks-api:1.4.1 cloud-stats:0.26 cloudbees-bitbucket-branch-source:2.9.7 cloudbees-disk-usage-simple:0.10 cloudbees-folder:6.15 command-launcher:1.5 config-file-provider:3.7.0 configuration-as-code:1.51 credentials-binding:1.24 credentials:2.3.14 display-url-api:2.3.4 docker-build-publish:1.3.2 docker-commons:1.17 docker-java-api:3.1.5.2 docker-plugin:1.2.2 docker-workflow:1.25 durable-task:1.35 echarts-api:4.9.0-3 email-ext:2.81 embeddable-build-status:2.0.3 extended-read-permission:3.2 external-monitor-job:1.7 favorite:2.3.2 font-awesome-api:5.15.2-1 git-client:3.6.0 git-server:1.9 git:4.5.2 github-api:1.122 github-branch-source:2.9.5 github-pullrequest:0.2.8 github:1.32.0 google-oauth-plugin:1.0.3 gradle:1.36 greenballs:1.15.1 handlebars:1.1.1 handy-uri-templates-2-api:2.1.8-1.0 hashicorp-vault-plugin:3.7.0 htmlpublisher:1.25 icon-shim:2.0.3 jackson2-api:2.12.1 javadoc:1.6 jclouds-jenkins:2.20 jdk-tool:1.4 jenkins-design-language:1.24.4 jira:3.1.3 jjwt-api:0.11.2-8.82737cbfa6f5 jquery-detached:1.2.1 jquery3-api:3.5.1-2 jquery:1.12.4-1 jsch:0.1.55.2 junit:1.48 kubernetes-cli:1.10.0 kubernetes-client-api:4.13.2-1 kubernetes-credentials:0.8.0 kubernetes:1.29.0 lockable-resources:2.10 mailer:1.32.1 mapdb-api:1.0.9.0 mask-passwords:2.13 matrix-auth:2.6.6 matrix-project:1.18 mercurial:2.12 metrics:4.0.2.7 momentjs:1.1.1 notification:1.14 oauth-credentials:0.4 okhttp-api:3.14.9 ownership:0.13.0 pam-auth:1.6 parameterized-scheduler:0.9.2 pipeline-build-step:2.13 pipeline-github-lib:1.0 pipeline-graph-analysis:1.10 pipeline-input-step:2.12 pipeline-milestone-step:1.3.2 pipeline-model-api:1.8.3 pipeline-model-definition:1.8.3 pipeline-model-extensions:1.8.3 pipeline-rest-api:2.19 pipeline-stage-step:2.5 pipeline-stage-tags-metadata:1.8.3 pipeline-stage-view:2.19 pipeline-utility-steps:2.6.1 plain-credentials:1.7 plugin-util-api:1.6.1 popper-api:1.16.1-1 prometheus:2.0.8 pubsub-light:1.13 resource-disposer:0.14 role-strategy:3.1 scm-api:2.6.4 script-security:1.76 slack:2.45 snakeyaml-api:1.27.0 sse-gateway:1.24 ssh-credentials:1.18.1 ssh-slaves:1.31.5 structs:1.21 subversion:2.14.0 timestamper:1.11.8 token-macro:2.13 trilead-api:1.0.13 variant:1.4 webhook-step:1.4 windows-slaves:1.7 workflow-aggregator:2.6 workflow-api:2.41 workflow-basic-steps:2.23 workflow-cps-global-lib:2.17 workflow-cps:2.87 workflow-durable-task-step:2.37 workflow-job:2.40 workflow-multibranch:2.22 workflow-scm-step:2.12 workflow-step-api:2.23 workflow-support:3.7 ws-cleanup:0.38 ```

Docker version on agents: 20.10.7

`docker version` output ``` Client: Docker Engine - Community Version: 20.10.7 API version: 1.41 Go version: go1.13.15 Git commit: f0df350 Built: Wed Jun 2 11:56:38 2021 OS/Arch: linux/amd64 Context: default Experimental: true Server: Docker Engine - Community Engine: Version: 20.10.7 API version: 1.41 (minimum version 1.12) Go version: go1.13.15 Git commit: b0f5bc3 Built: Wed Jun 2 11:54:50 2021 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.4.4 GitCommit: 05f951a3781f4f2c1911b05e61c160e9c30eaa8e runc: Version: 1.0.0-rc93 GitCommit: 12644e614e25b05da6fd08a38ffa0cfe1903fdec docker-init: Version: 0.19.0 GitCommit: de40ad0 ```

OS: Ubuntu 20.04 LTS on Jenkins master and every agent.

Reproduction steps

Results

Expected result:

Jenkins can spin a new agent and connect to it using SSH at any time.

Actual result:

Jenkins can spin a new agent but unable to connect it using SSH by the reason explained below.

From docker ps output:

0f1a5876f016   [REDACTED]/jenkins-ci-dinfra:stable   "setup-sshd /usr/sbi…"   3 minutes ago   Up 3 minutes   0.0.0.0:49243->22/tcp, :::49242->22/tcp   musing_boyd
0a5530eb2201   [REDACTED]/jenkins-ci-dinfra:stable   "setup-sshd /usr/sbi…"   5 hours ago     Up 5 hours     0.0.0.0:49205->22/tcp, :::49204->22/tcp   vigorous_blackwell

You can see IPv4 port is different from IPv6 port (49243 vs 49242). Somehow Jenkins is using IPv6 port when trying to ssh into the agent.

I did docker inspect and get logs from Jenkins but for different case (not the same as docker ps output above). But situation is the same.

Logs from Jenkins master (hostnames are altered):

SSHLauncher{host='slavep3.node', port=49739, credentialsId='13457128-567e-4f7d-bd8c-1e85c619b69e', jvmOptions='', javaPath='', prefixStartSlaveCmd='', suffixStartSlaveCmd='', launchTimeoutSeconds=60, maxNumRetries=30, retryWaitTime=2, sshHostKeyVerificationStrategy=hudson.plugins.sshslaves.verifiers.NonVerifyingKeyVerificationStrategy, tcpNoDelay=true, trackCredentials=true}
[06/16/21 12:22:47] [SSH] Opening SSH connection to slavep3.node:49739.
Connection refused (Connection refused)
[long java trace here]

NetworkSettings.Ports from docker inspect output:

            "Ports": {
                "22/tcp": [
                    {
                        "HostIp": "0.0.0.0",
                        "HostPort": "49740"
                    },
                    {
                        "HostIp": "::",
                        "HostPort": "49739"
                    }
                ]
            },

As you can see Jenkins was trying to connect to the port 49739 via IPv4 (we don't have IPv6 connectivity at the moment). But docker-proxy was listening on port 49740 for IPv4 instead.

jay7x commented 3 years ago

Our pipelines are spawning another docker containers to run some tests and theoretically can take some port above 40000 for a while. So docker may fail to listen on the port and choosing next one for IPv4. We're not using IPv6 here. That's why ports are different (I guess). But why Jenkins is always choosing IPv6 one is another question...

edlevin6612 commented 3 years ago

Noticed a similar behavior after swapping an old Docker VM for one supporting IPv6. Issue seems to happen sporadically in my case, agents can be spinning up fine then after a while same behavior as outlined above except we get a different error in the Jenkins log:

java.io.IOException: SSH service hadn't started after 60 seconds and 52 milliseconds.Try increasing the number of retries (currently 30) and/or the retry wait time (currently 2) to allow for containers taking longer to start.
    at io.jenkins.docker.connector.DockerComputerSSHConnector.createLauncher(DockerComputerSSHConnector.java:269)
    at io.jenkins.docker.connector.DockerComputerConnector.createLauncher(DockerComputerConnector.java:91)
    at com.nirima.jenkins.plugins.docker.DockerTemplate.doProvisionNode(DockerTemplate.java:574)
    at com.nirima.jenkins.plugins.docker.DockerTemplate.provisionNode(DockerTemplate.java:536)
    at com.nirima.jenkins.plugins.docker.DockerCloud$1.run(DockerCloud.java:370)

Restarting the docker daemon resolves the issue until the next time it occurs. I am in AWS and security group fronting the Docker instance currently does not allow IPv6 ingress. Next time the issue occurs I am going to allow IPv6 traffic to see if it has an effect.

mikjonsson commented 3 years ago

I think I'm hitting the same issue. IPv4 port is bound to 49222 and IPv6 to 49221, see below.

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 1e3b258d65c8 <MY_IMAGE> "/usr/sbin/sshd -D -…" 14 seconds ago Up 13 seconds 0.0.0.0:49222->22/tcp, :::49221->22/tcp distracted_tesla

While Jenkins tries to connect to the IPv4 address using the IPv6 port, see from Jenkins log:

Could not connect to <MY_IP> port 49221. Are you sure this location is contactable from Jenkins?

Our workaround was to disable IPv6 on the host machine.

edlevin6612 commented 3 years ago

I disabled IPv6 support in Docker daemon config and haven't had the issue reoccur.

mikjonsson commented 3 years ago

I may be out on a limb here, as I've only been browsing the code on GitHub and haven't debugged it (and may not even be looking at the correct part of the code for all I know), but in [DockerComputerSSHConnector.java getBindingForPort](https://github.com/jenkinsci/docker-plugin/blob/master/src/main/java/io/jenkins/docker/connector/DockerComputerSSHConnector.java#:~:text=private%20static%20InetSocketAddress-,getBindingForPort,-(DockerAPI%20api%2C%20InspectContainerResponse) there's this:

        // Find where it's mapped to
        for (Ports.Binding b : sshBindings) {
            String hps = b.getHostPortSpec();
            port = Integer.valueOf(hps);
        }
        String host = getExternalIP(api, ir, networkSettings, sshBindings);
        return new InetSocketAddress(host, port);

Looks like in the case of multiple bindings it will always return the port for the last binding in sshBindings without validating that it is the correct port, which may cause an issue if the correct port is earlier in the array.

pjdarton commented 3 years ago

Yup, that's the correct bit of code. The problem is that the plugin doesn't really know which IP/port is going to be "the one that works" - it has no visibility of the network environment in which Jenkins runs; it doesn't know what iPs are routable and which aren't so it just has to blindly believe the docker daemon's output as it knows no better. FYI this is a problem common to other "cloud provider" plugins too - the plugin can't (easily) second-guess the operating system's routing table and/or whatever external routes exist to decide "we'll ignore that one as we know IPv6 won't work here" etc. IME, when the Jenkins master's network's ability to SSH to a remote agent is incomplete, it's best to use JNLP and have the remote agent call Jenkins instead.

If y'all can figure out some means by which the plugin could make a decision (and then submit a PR for it), that would be welcomed, but if you merely need a workaroud, I'd suggest using JNLP or "Direct Attach" instead of SSH.

Sulphurium-Brimstone commented 3 years ago

This may be wrong but from what I can tell the the code above sets port number to last binding. However, getExternalIP returns the IP of the first binding if it is a swarm. If this is case that would explain the issue. It seems to me that the IP and port need to be synced to match bindings returned from docker.

        if (api.isSwarm()) {
            for (Ports.Binding b : sshBindings) {
                String ipAddress = b.getHostIp();
                if (ipAddress != null && !"0.0.0.0".equals(ipAddress)) {
                    return ipAddress;
                }
            }
        }
skorzhevsky commented 3 years ago

Does anyone know, when this was introduced? We have this annoying issue and want to revert back to some old version if this helps.

skorzhevsky commented 3 years ago

I disabled IPv6 support in Docker daemon config and haven't had the issue reoccur.

It is disabled by default, according to 'man dockerd':

       --ipv6=true|false
         Enable IPv6 support. Default is false.
edlevin6612 commented 3 years ago

Sorry forgot to mention, I had "ipv6": true in /etc/docker/daemon.json initially (AMI was custom built with IPv6 enabled) but then disabled it as a result of running into this issue.

edlevin6612 commented 3 years ago

Apparently "ipv6": true option did not fix the issue for me after all as it was still reoccurring. This option does not prevent Docker from mapping container ports to IPv6 addresses. What I ultimately did was disable IPv6 on my Docker host in the kernel"

net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6=1
datalogics-robb commented 2 years ago

I'm also seeing this issue, but maybe have a breadcrumb to help. I set up docker on an ARMv8 Linux machine and we've had no issues. Then I attempted to add an x86_64 docker server and nothing can connect because it runs the ssh daemon's IPv4 on a different port than the one it looks for, which is the IPv6 port it uses.

and - adding the --ipv6=false flag to the dockerd command seemed to resolve the issue.

basil commented 1 year ago

Please re-enable IPv6 in Docker and try the incremental build from #962 to see whether it resolves the issue for you.

basil commented 1 year ago

If the workaround added in #962 is insufficient, then please open a new ticket with steps to reproduce the issue.