Open timja opened 5 years ago
An interesting and frustrating situation but I'm afraid I don't any good answers or suggestions for you.
Let me if I can summarize your situation. The agent log shows that the agent is connected. The master log shows that the agent has connected. But, the master UI is not showing the agent as connected and the system doesn't know it is connected to schedule builds on it. Does that describe it correctly?
If that's the case, it sounds like the connection is being made fine, but somehow it's not being added into all the required places inside the master. It's hard to guess what might be happening there.
I suggest combing through the master logs for anything relating to node, computer, agent, or slave to see if you can find any errors. You could try using the script console to poke around Jenkins.getNodes() and anything else related to see if you can find anything interesting.
Please report back with any results.
Last week we upgraded to 2.176.1 LTS, problem still happening
here is example on k8s this time
on 10:06 agent log says it is connected but server waits for 15 minutes it until timeout exceeded and than kill container
java.lang.IllegalStateException: Agent is not connected after 900 seconds, status: Running at org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher.launch(KubernetesLauncher.java:193) at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:294) at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46) at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2019-06-28 10:21:55.279+0000 [id=13265377] FINER o.c.j.p.k.KubernetesLauncher#launch: Removing Jenkins node: northplains-xinet-centos6-cibuild-dmz04
slaveComputer == null || slaveComputer.isOffline() condition is true
seems in my initial post I overestimated fail rate, according to last logs I have on kubernetes fail rate is just 0.5%
*non* viene effettuata nessun'altra _formattazione_ qui
Hi all,
I've the same issue since yesterday. Suddenly only one type of agent of mine has stopped to work.
Have you an idea about the problem cause?
the log of the agent
kubectl logs worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-l8f6b -c jnlp Warning: JnlpProtocol3 is disabled by default, use JNLP_PROTOCOL_OPTS to alter the behavior Oct 04, 2019 11:58:21 AM hudson.remoting.jnlp.Main createEngine INFO: Setting up agent: worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-l8f6b Oct 04, 2019 11:58:21 AM hudson.remoting.jnlp.Main$CuiListenerINFO: Jenkins agent is running in headless mode. Oct 04, 2019 11:58:21 AM hudson.remoting.Engine startEngine INFO: Using Remoting version: 3.27 Oct 04, 2019 11:58:21 AM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir INFO: Using /home/jenkins/agent/remoting as a remoting work directory Both error and output logs will be printed to /home/jenkins/agent/remoting Oct 04, 2019 11:58:21 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Locating server among [http://jenkins:8080/] Oct 04, 2019 11:58:21 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve INFO: Remoting server accepts the following protocols: [JNLP4-connect, Ping] Oct 04, 2019 11:58:21 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve INFO: Remoting TCP connection tunneling is enabled. Skipping the TCP Agent Listener Port availability check Oct 04, 2019 11:58:21 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Agent discovery successful Agent address: jenkins-agent Agent port: 50000 Identity: 4e:a3:2e:fe:b5:3e:d1:26:ce:65:0f:3f:2c:86:9b:fc Oct 04, 2019 11:58:21 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Handshaking Oct 04, 2019 11:58:21 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Connecting to jenkins-agent:50000 Oct 04, 2019 11:58:22 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Trying protocol: JNLP4-connect Oct 04, 2019 11:58:22 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Remote identity confirmed: 4e:a3:2e:fe:b5:3e:d1:26:ce:65:0f:3f:2c:86:9b:fc Oct 04, 2019 11:58:23 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Connected
the log of jenkins master
2019-10-04 12:00:03.143+0000 [id=4588] WARNING o.c.j.p.k.KubernetesLauncher#launch: Error in provisioning; agent=KubernetesSlave name: worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-l8f6b, template=PodTemplate{, name='worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c', namespace='default', label='worker-bbb6979c-390b-4867-81c6-03686354689d', nodeUsageMode=EXCLUSIVE, containers=[ContainerTemplate{name='deploy-to-test', image='mycare-dev.icp:8500/default/vpn-icp-tools:latest', privileged=true, alwaysPullImage=true, command='/start.sh', resourceRequestCpu='50m', resourceRequestMemory='100Mi', resourceLimitCpu='200m', resourceLimitMemory='300Mi', envVars=[SecretEnvVar [secretName=vpn-mycare-secret, secretKey=vpnaddr, getKey()=VPNADDR], SecretEnvVar [secretName=vpn-mycare-secret, secretKey=vpnuser, getKey()=VPNUSER], SecretEnvVar [secretName=vpn-mycare-secret, secretKey=vpnpass, getKey()=VPNPASS], SecretEnvVar [secretName=vpn-mycare-secret, secretKey=vpnrdpip, getKey()=VPNRDPIP], SecretEnvVar [secretName=vpn-mycare-secret, secretKey=icpuser, getKey()=ICPUSERNAME], SecretEnvVar [secretName=vpn-mycare-secret, secretKey=icppasswd, getKey()=ICPPASSWD]]}], annotations=[org.csanchez.jenkins.plugins.kubernetes.PodAnnotation@aab9c821]} java.lang.IllegalStateException: Agent is not connected after 100 seconds, status: Running at org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher.launch(KubernetesLauncher.java:193) at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:297) at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46) at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2019-10-04 12:00:03.143+0000 [id=4588] INFO o.c.j.p.k.KubernetesSlave#_terminate: Terminating Kubernetes instance for agent worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-l8f6b 2019-10-04 12:00:03.181+0000 [id=4588] INFO o.c.j.p.k.KubernetesSlave#deleteSlavePod: Terminated Kubernetes instance for agent default/worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-l8f6b 2019-10-04 12:00:03.181+0000 [id=4588] INFO o.c.j.p.k.KubernetesSlave#_terminate: Disconnected computer worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-l8f6b Terminated Kubernetes instance for agent default/worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-l8f6b Disconnected computer worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-l8f6b 2019-10-04 12:00:09.146+0000 [id=24] INFO o.c.j.p.k.KubernetesCloud#provision: Excess workload after pending Kubernetes agents: 1 2019-10-04 12:00:09.146+0000 [id=24] INFO o.c.j.p.k.KubernetesCloud#provision: Template for label worker-bbb6979c-390b-4867-81c6-03686354689d: Kubernetes Pod Template 2019-10-04 12:00:19.171+0000 [id=39] INFO hudson.slaves.NodeProvisioner#lambda$update$6: Kubernetes Pod Template provisioning successfully completed. We have now 2 computer(s) 2019-10-04 12:00:19.181+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Created Pod: default/worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:00:19.249+0000 [id=4646] INFO o.internal.platform.Platform#log: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path? 2019-10-04 12:00:21.704+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Pod is running: default/worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:00:21.708+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (0/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:00:22.203+0000 [id=4648] INFO h.TcpSlaveAgentListener$ConnectionHandler#run: Accepted JNLP4-connect connection #83 from /10.1.19.42:42086 2019-10-04 12:00:22.711+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (1/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:00:23.716+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (2/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:00:24.724+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (3/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:00:25.728+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (4/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:00:26.734+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (5/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:00:27.737+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (6/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:00:28.741+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (7/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:00:29.755+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (8/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:00:30.758+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (9/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:00:31.762+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (10/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:00:32.767+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (11/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:00:33.773+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (12/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:00:34.783+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (13/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:00:35.788+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (14/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:00:36.795+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (15/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:00:37.799+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (16/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:00:38.813+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (17/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:00:39.824+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (18/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:00:40.828+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (19/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:00:41.833+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (20/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:00:42.837+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (21/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:00:43.842+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (22/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:00:44.847+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (23/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:00:45.855+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (24/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:00:46.873+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (25/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:00:47.877+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (26/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:00:48.882+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (27/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:00:49.891+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (28/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:00:50.895+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (29/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:00:51.899+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (30/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:00:52.903+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (31/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:00:53.907+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (32/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:00:54.915+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (33/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:00:55.927+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (34/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:00:56.937+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (35/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:00:57.941+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (36/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:00:58.945+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (37/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:00:59.954+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (38/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:01:00.964+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (39/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:01:01.967+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (40/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:01:02.974+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (41/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:01:03.978+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (42/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:01:04.986+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (43/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:01:05.992+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (44/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:01:06.996+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (45/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:01:08.000+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (46/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:01:09.003+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (47/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:01:10.029+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (48/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:01:11.032+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (49/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:01:12.037+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (50/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:01:13.041+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (51/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:01:14.045+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (52/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:01:15.051+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (53/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:01:16.054+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (54/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc 2019-10-04 12:01:17.061+0000 [id=4588] INFO o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (55/100): worker-bbb6979c-390b-4867-81c6-03686354689d-nhc1c-fjfjc
Hi Jenkins team/community
we started to observe that our cloud agents sometimes (roughly fail rate is
10%0.5% of launches) fail to come up online within first attempt. Here it a timeline and logs of such case:~13:40 agent was created by cloud plugin (VMware in this case, but same problem was observed on kubernetes cloud also, so i suggest cloud type is not important)
Here is agent log
here is part of server log related to same agent
but agent did not show up online at 13:40 as you can see on load statistic screenshot (0 executors available)
I waited 5 minutes and restarted jenkins windows service on agent VM at 13:46,
here is agent log
Server log
and after that agent connected and come online and started job execution , (see load statistic screenshot above)
We upgraded to 2.164.3 from 2.164.1 last week and noticed this issue but we are not 100% sure it wasn't there before upgrade.
Any Ideas or direction to troubleshoot further are appreciated
Thanks
Originally reported by iceiceice, imported from: agent connected via TCP but executor does not show up online sometimes