jenkinsci / docker-agent

Jenkins agent (base image) and inbound agent Docker images
https://hub.docker.com/r/jenkins/inbound-agent/
MIT License
282 stars 232 forks source link

Waiting for agent to connect #612

Closed gfrid closed 1 year ago

gfrid commented 2 years ago

Jenkins and plugins versions report

Jenkins: 2.342 OS: Linux - 5.4.0-1060-aws

ace-editor:1.1 antisamy-markup-formatter:2.1 apache-httpcomponents-client-4-api:4.5.13-1.0 authentication-tokens:1.4 authorize-project:1.3.0 azure-ad:1.2.3 azure-commons:1.1.1 bootstrap4-api:4.6.0-1 bouncycastle-api:2.25 branch-api:2.6.3 build-timeout:1.20 build-user-vars-plugin:1.7 caffeine-api:2.9.2-29.v717aac953ff3 checks-api:1.5.0 cloudbees-folder:6.714.v79e858ef76a_2 command-launcher:1.5 credentials:2.6.1 credentials-binding:1.27.1 dark-theme:0.0.12 display-url-api:2.3.5 durable-task:495.v29cd95ec10f2 echarts-api:4.9.0-3 email-ext:2.81 embeddable-build-status:2.0.3 font-awesome-api:5.15.2-1 git:4.6.0 git-client:3.6.0 git-server:1.9 github:1.33.1 github-api:1.122 github-branch-source:2.10.1 gradle:1.36 handlebars:1.1.1 htmlpublisher:1.25 jackson2-api:2.13.2-260.v43d711474c77 javax-activation-api:1.2.0-2 javax-mail-api:1.6.2-5 jdk-tool:1.5 jjwt-api:0.11.2-9.c8b45b8bb173 jquery-detached:1.2.1 jquery3-api:3.5.1-2 jsch:0.1.55.2 junit:1.48 kubernetes:3580.v78271e5631dc kubernetes-client-api:5.12.1-187.v577c3e368fb_6 kubernetes-credentials:0.9.0 lockable-resources:2.10 mailer:408.vd726a_1130320 mask-passwords:3.0 matrix-auth:2.6.6 matrix-project:1.20 metrics:4.0.2.8.1 momentjs:1.1.1 naginator:1.18.1 okhttp-api:3.14.9 pam-auth:1.6 parameter-separator:1.3 pipeline-build-step:2.13 pipeline-graph-analysis:1.10 pipeline-input-step:2.12 pipeline-milestone-step:1.3.2 pipeline-model-api:1.9.3 pipeline-model-definition:1.8.4 pipeline-model-extensions:1.9.3 pipeline-rest-api:2.19 pipeline-stage-step:2.5 pipeline-stage-tags-metadata:1.8.4 pipeline-stage-view:2.19 pipeline-utility-steps:2.8.0 plain-credentials:1.7 plugin-util-api:1.7.1 popper-api:1.16.1-1 resource-disposer:0.14 role-strategy:3.1 saml:2.0.0 scm-api:2.6.5 script-security:1138.v8e727069a_025 snakeyaml-api:1.29.1 ssh-credentials:1.18.1 ssh-slaves:1.31.5 sshd:3.0.3 strict-crumb-issuer:2.1.0 structs:308.v852b473a2b8c theme-manager:0.6 thinBackup:1.10 timestamper:1.11.8 token-macro:2.15 trilead-api:1.0.13 validating-email-parameter:1.10 validating-string-parameter:2.8 variant:1.4 windows-slaves:1.8 workflow-api:1143.v2d42f1e9dea_5 workflow-basic-steps:2.23 workflow-cps:2660.vb_c0412dc4e6d workflow-cps-global-lib:2.18 workflow-durable-task-step:2.37 workflow-job:1145.v7f2433caa07f workflow-multibranch:2.22 workflow-scm-step:2.13 workflow-step-api:622.vb_8e7c15bc95a workflow-support:3.8 ws-cleanup:0.38

What Operating System are you using (both controller, and any agents involved in the problem)?

Windows core server 2019 node with K8s 1.21 on EKS

Reproduction steps

podTemplate(yaml: ''' apiVersion: v1 kind: Pod spec: containers:

you can take any MCR (microsoft container) for example

Expected Results

JNLP agent should connect to master server PODS are running

k8s-test-429-l3k68-vsq05-3g4xl 2/2 Running

Actual Results

Waiting for agent to connect (30/1000): k8s-test-429-l3k68-vsq05-3g4xl Waiting for agent to connect (40/1000): k8s-test-429-l3k68-vsq05-3g4xl

HTTP ERROR 404 Not Found

URI: | /computer/k8s-test-429-l3k68-vsq05-3g4xl/logText/progressiveHtml 404 Not Found

Anything else?

java.lang.IllegalStateException: Node was deleted, computer is null at org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher.launch(KubernetesLauncher.java:193) at hudson.slaves.SlaveComputer.lambda$_connect$0(SlaveComputer.java:297) at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46) at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:80) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

Apr 18, 2022 3:30:37 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate Terminating Kubernetes instance for agent k8s-test-429-l3k68-vsq05-3g4xl

Apr 18, 2022 3:30:37 PM SEVERE org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate Computer for agent is null: k8s-test-429-l3k68-vsq05-3g4xl

Apr 18, 2022 3:30:37 PM INFO hudson.slaves.AbstractCloudSlave terminate FATAL: Computer for agent is null: k8s-test-429-l3k68-vsq05-3g4xl

Apr 18, 2022 3:30:46 PM WARNING io.fabric8.kubernetes.client.Config tryServiceAccount Error reading service account token from: [/var/run/secrets/kubernetes.io/serviceaccount/token]. Ignoring.

MarkEWaite commented 2 years ago

Remoting agent jenkins/agent:4.6-1-jdk8-windowsservercore-1809 is out of date.

Use jenkins/agent:4.13-1-jdk8-windowsservercore-1809

gfrid commented 2 years ago

there is no such image, trying now jenkins/inbound-agent:4.13-1-jdk11-nanoserver-1809

gfrid commented 2 years ago

[Normal][jenkins/k8s-test-435-6xg4c-gfjgm-vjnv0][Pulled] Successfully pulled image "jenkins/inbound-agent:4.13-1-jdk11-nanoserver-1809" in 5m0.0400562s [Normal][jenkins/k8s-test-435-6xg4c-gfjgm-vjnv0][Created] Created container jnlp [Normal][jenkins/k8s-test-435-6xg4c-gfjgm-vjnv0][Started] Started container jnlp [Normal][jenkins/k8s-test-435-6xg4c-gfjgm-vjnv0][Pulling] Pulling image "743808500811.dkr.ecr.us-east-1.amazonaws.com/core-net-powershell-azure-ad:latest" [Normal][jenkins/k8s-test-435-6xg4c-gfjgm-vjnv0][Pulled] Successfully pulled image "743808500811.dkr.ecr.us-east-1.amazonaws.com/core-net-powershell-azure-ad:latest" in 646.0898ms [Normal][jenkins/k8s-test-435-6xg4c-gfjgm-vjnv0][Created] Created container powershell [Normal][jenkins/k8s-test-435-6xg4c-gfjgm-vjnv0][Started] Started container powershell jenkins/k8s-test-435-6xg4c-gfjgm-vjnv0 Container jnlp was terminated (Exit Code: 1, Reason: Error)

gfrid commented 2 years ago

same happens with: jenkins/inbound-agent:4.13-1-windowsservercore-ltsc2019 btw using TCP conneciton

SEVERE org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate Computer for agent is null INFO hudson.slaves.AbstractCloudSlave terminate FATAL: Computer for agent is null

gfrid commented 2 years ago

definitely something is wrong with windows JNLP, Linux JNLP runs correct

Terminating Kubernetes instance for agent k8s-test-450-qqfht-mlpxl-q869v Apr 18, 2022 6:08:07 PM SEVERE org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate Computer for agent is null: k8s-test-450-qqfht-mlpxl-q869v Apr 18, 2022 6:08:07 PM INFO hudson.slaves.AbstractCloudSlave terminate WINDOWS ----> FATAL: Computer for agent is null: k8s-test-450-qqfht-mlpxl-q869v Apr 18, 2022 6:10:39 PM INFO hudson.slaves.NodeProvisioner update k8s-test-451-r1nl3-53xpb-k91zc provisioning successfully completed. We have now 4 computer(s) Apr 18, 2022 6:10:39 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch Created Pod: kubernetes jenkins/k8s-test-451-r1nl3-53xpb-k91zc Apr 18, 2022 6:10:42 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch

Pod is running: kubernetes jenkins/k8s-test-451-r1nl3-53xpb-k91zc Apr 18, 2022 6:10:43 PM INFO hudson.TcpSlaveAgentListener$ConnectionHandler run LINUX ---> Accepted JNLP4-connect connection jenkinsci/docker-agent#726 from 10.xxx.xxx.73/10.xxx.xxx.73:33490

gfrid commented 2 years ago

@MarkEWaite my EKS node is Datacenter 1809 0.0.17763.2686 the guy in the may12th says his reverting to 10.0.17763.1158 not sure if its available on AWS marketplace

i will try to take this first available image this is the earliest i can get 10.0.17763.529

not working either, in this node version its k8s 1.11v

gfrid commented 2 years ago

update got debug logs from pod jenkins/inbound-agent:4.13-1-windowsservercore-ltsc2019

]0;C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe[?25h ]0;Administrator: C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe[?25lUrl is required [?25h[?25l At C:\ProgramData\Jenkins\jenkins-agent.ps1:26 char:122

gfrid commented 2 years ago

further investigation: without setting URL as var this is what i get:

Url is required At C:\ProgramData\Jenkins\jenkins-agent.ps1:26 char:122

if setting the var then this is what i get:

kubectl logs -c jnlp -f k8s-test-536-00mtt-vrgql-ddkm4 -n jenkins

option "-direct (-directConnection)" cannot be used with the option(s) [-url, -t unnel] java -jar agent.jar [options...] -agentLog FILE : Local agent error log destination (overrides workDir) -cert VAL : Specify additional X.509 encoded PEM certificates to trust when connecting to Jenkins root URLs. If starting with @ then the remainder is assumed to be the name of the certificate file to read. -credentials USER:PASSWORD : HTTP BASIC AUTH header to pass in for making HTTP requests. -direct (-directConnection) HOST:PORT : Connect directly to this TCP agent port, skipping the HTTP(S) connection parameter download. For example, "myjenkins:50000".

gfrid commented 2 years ago

problem is solved only by using WebSockets

dduportal commented 1 year ago

Thanks @gfrid for explaining how to fix the issue!

slide commented 1 year ago

@dduportal I don't think that using websockets will work for everyone, so I don't think this is resolved.

dduportal commented 1 year ago

Hello @gfrid @slide do you still see the issue with the latest images?

slide commented 1 year ago

I would need to go back and test this more, I don't think I was able to reproduce it.

dduportal commented 1 year ago

Is this problem still present? Asking as no answer back from the reporter and I can't reproduce it on the Jenkins infra's Windows machines.