Waiting for agent to connect #612

Closed gfrid closed 1 year ago

gfrid commented 2 years ago

Jenkins and plugins versions report

Jenkins: 2.342 OS: Linux - 5.4.0-1060-aws

What Operating System are you using (both controller, and any agents involved in the problem)?

Windows core server 2019 node with K8s 1.21 on EKS

Reproduction steps

podTemplate(yaml: ''' apiVersion: v1 kind: Pod spec: containers:

you can take any MCR (microsoft container) for example

Expected Results

JNLP agent should connect to master server PODS are running

k8s-test-429-l3k68-vsq05-3g4xl 2/2 Running

Actual Results

Waiting for agent to connect (30/1000): k8s-test-429-l3k68-vsq05-3g4xl Waiting for agent to connect (40/1000): k8s-test-429-l3k68-vsq05-3g4xl

HTTP ERROR 404 Not Found

URI: | /computer/k8s-test-429-l3k68-vsq05-3g4xl/logText/progressiveHtml 404 Not Found

Anything else?

java.lang.IllegalStateException: Node was deleted, computer is null at org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher.launch( at hudson.slaves.SlaveComputer.lambda$_connect$0( at jenkins.util.ContextResettingExecutorService$ at$ at at java.util.concurrent.ThreadPoolExecutor.runWorker( at java.util.concurrent.ThreadPoolExecutor$ at

Apr 18, 2022 3:30:37 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate Terminating Kubernetes instance for agent k8s-test-429-l3k68-vsq05-3g4xl

Apr 18, 2022 3:30:37 PM SEVERE org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate Computer for agent is null: k8s-test-429-l3k68-vsq05-3g4xl

Apr 18, 2022 3:30:37 PM INFO hudson.slaves.AbstractCloudSlave terminate FATAL: Computer for agent is null: k8s-test-429-l3k68-vsq05-3g4xl

Apr 18, 2022 3:30:46 PM WARNING io.fabric8.kubernetes.client.Config tryServiceAccount Error reading service account token from: [/var/run/secrets/]. Ignoring.

MarkEWaite commented 2 years ago

Remoting agent jenkins/agent:4.6-1-jdk8-windowsservercore-1809 is out of date.

Use jenkins/agent:4.13-1-jdk8-windowsservercore-1809

gfrid commented 2 years ago

there is no such image, trying now jenkins/inbound-agent:4.13-1-jdk11-nanoserver-1809

gfrid commented 2 years ago

[Normal][jenkins/k8s-test-435-6xg4c-gfjgm-vjnv0][Pulled] Successfully pulled image "jenkins/inbound-agent:4.13-1-jdk11-nanoserver-1809" in 5m0.0400562s [Normal][jenkins/k8s-test-435-6xg4c-gfjgm-vjnv0][Created] Created container jnlp [Normal][jenkins/k8s-test-435-6xg4c-gfjgm-vjnv0][Started] Started container jnlp [Normal][jenkins/k8s-test-435-6xg4c-gfjgm-vjnv0][Pulling] Pulling image "" [Normal][jenkins/k8s-test-435-6xg4c-gfjgm-vjnv0][Pulled] Successfully pulled image "" in 646.0898ms [Normal][jenkins/k8s-test-435-6xg4c-gfjgm-vjnv0][Created] Created container powershell [Normal][jenkins/k8s-test-435-6xg4c-gfjgm-vjnv0][Started] Started container powershell jenkins/k8s-test-435-6xg4c-gfjgm-vjnv0 Container jnlp was terminated (Exit Code: 1, Reason: Error)

gfrid commented 2 years ago

same happens with: jenkins/inbound-agent:4.13-1-windowsservercore-ltsc2019 btw using TCP conneciton

SEVERE org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate Computer for agent is null INFO hudson.slaves.AbstractCloudSlave terminate FATAL: Computer for agent is null

gfrid commented 2 years ago

definitely something is wrong with windows JNLP, Linux JNLP runs correct

Terminating Kubernetes instance for agent k8s-test-450-qqfht-mlpxl-q869v Apr 18, 2022 6:08:07 PM SEVERE org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate Computer for agent is null: k8s-test-450-qqfht-mlpxl-q869v Apr 18, 2022 6:08:07 PM INFO hudson.slaves.AbstractCloudSlave terminate WINDOWS ----> FATAL: Computer for agent is null: k8s-test-450-qqfht-mlpxl-q869v Apr 18, 2022 6:10:39 PM INFO hudson.slaves.NodeProvisioner update k8s-test-451-r1nl3-53xpb-k91zc provisioning successfully completed. We have now 4 computer(s) Apr 18, 2022 6:10:39 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch Created Pod: kubernetes jenkins/k8s-test-451-r1nl3-53xpb-k91zc Apr 18, 2022 6:10:42 PM INFO org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch

Pod is running: kubernetes jenkins/k8s-test-451-r1nl3-53xpb-k91zc Apr 18, 2022 6:10:43 PM INFO hudson.TcpSlaveAgentListener$ConnectionHandler run LINUX ---> Accepted JNLP4-connect connection jenkinsci/docker-agent#726 from

gfrid commented 2 years ago

@MarkEWaite my EKS node is Datacenter 1809 0.0.17763.2686 the guy in the may12th says his reverting to 10.0.17763.1158 not sure if its available on AWS marketplace

i will try to take this first available image this is the earliest i can get 10.0.17763.529

not working either, in this node version its k8s 1.11v

gfrid commented 2 years ago

update got debug logs from pod jenkins/inbound-agent:4.13-1-windowsservercore-ltsc2019

]0;C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe[?25h ]0;Administrator: C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe[?25lUrl is required [?25h[?25l At C:\ProgramData\Jenkins\jenkins-agent.ps1:26 char:122

gfrid commented 2 years ago

further investigation: without setting URL as var this is what i get:

Url is required At C:\ProgramData\Jenkins\jenkins-agent.ps1:26 char:122

if setting the var then this is what i get:

kubectl logs -c jnlp -f k8s-test-536-00mtt-vrgql-ddkm4 -n jenkins

option "-direct (-directConnection)" cannot be used with the option(s) [-url, -t unnel] java -jar agent.jar [options...] -agentLog FILE : Local agent error log destination (overrides workDir) -cert VAL : Specify additional X.509 encoded PEM certificates to trust when connecting to Jenkins root URLs. If starting with @ then the remainder is assumed to be the name of the certificate file to read. -credentials USER:PASSWORD : HTTP BASIC AUTH header to pass in for making HTTP requests. -direct (-directConnection) HOST:PORT : Connect directly to this TCP agent port, skipping the HTTP(S) connection parameter download. For example, "myjenkins:50000".

gfrid commented 2 years ago

problem is solved only by using WebSockets

dduportal commented 1 year ago

Thanks @gfrid for explaining how to fix the issue!

slide commented 1 year ago

@dduportal I don't think that using websockets will work for everyone, so I don't think this is resolved.

dduportal commented 1 year ago

Hello @gfrid @slide do you still see the issue with the latest images?

slide commented 1 year ago

I would need to go back and test this more, I don't think I was able to reproduce it.

dduportal commented 1 year ago

Is this problem still present? Asking as no answer back from the reporter and I can't reproduce it on the Jenkins infra's Windows machines.