This PR adds synchronize to the launch method to prevent multiple threads simultaneously spinning up the same computer.
In a worse case scenario, simultaneous calls to launch will cause containers to spin up in ECS that will attempt to connect to Jenkins forever, but will fail with with the following traceback
INFO: [JNLP4-connect connection to redacted-ip] Local headers refused by remote: Unknown client name: redacted-computer-name
Sep 13, 2022 3:01:05 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Protocol JNLP4-connect encountered an unexpected exception
java.util.concurrent.ExecutionException: org.jenkinsci.remoting.protocol.impl.ConnectionRefusalException: Unknown client name: redacted-computer-name
at org.jenkinsci.remoting.util.SettableFuture.get(SettableFuture.java:223)
at hudson.remoting.Engine.innerRun(Engine.java:799)
at hudson.remoting.Engine.run(Engine.java:544)
Caused by: org.jenkinsci.remoting.protocol.impl.ConnectionRefusalException: Unknown client name: redacted-computer-name
at org.jenkinsci.remoting.protocol.impl.ConnectionHeadersFilterLayer.newAbortCause(ConnectionHeadersFilterLayer.java:380)
at org.jenkinsci.remoting.protocol.impl.ConnectionHeadersFilterLayer.onRecvClosed(ConnectionHeadersFilterLayer.java:435)
at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:825)
at org.jenkinsci.remoting.protocol.FilterLayer.onRecvClosed(FilterLayer.java:289)
at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecvClosed(SSLEngineFilterLayer.java:168)
at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:825)
at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:155)
at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer.access$1500(BIONetworkLayer.java:51)
at org.jenkinsci.remoting.protocol.impl.BIONetworkLayer$Reader.run(BIONetworkLayer.java:257)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:126)
at java.lang.Thread.run(Thread.java:750)
Suppressed: java.nio.channels.ClosedChannelException
... 7 more
Sep 13, 2022 3:01:05 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: reconnect rejected, sleeping 10s:
java.lang.Exception: The server rejected the connection: None of the protocols were accepted
at hudson.remoting.Engine.onConnectionRejected(Engine.java:878)
at hudson.remoting.Engine.innerRun(Engine.java:825)
at hudson.remoting.Engine.run(Engine.java:544)
If we take a look at the logs from Jenkins, we get a better picture of what's going on.
Sep 8 02:24:09 jenkins jenkins[86776]: 2022-09-08 02:24:09.148+0000 [id=10094300]#011INFO#011c.c.j.p.a.p.ECSDeclarativeAgent#getAsArgs: In getAsArgs. label: aJenkinsLabel-116-t03dc
Sep 8 02:24:09 jenkins jenkins[86776]: 2022-09-08 02:24:09.148+0000 [id=10094300]#011INFO#011c.c.j.p.a.p.ECSDeclarativeAgent#getAsArgs: In getAsArgs. argMap: {assignPublicIp=false, cloud=JenkinsCloudName, cpu=1012, image=1111111.dkr.ecr.us-east-1.amazonaws.com/jenkins-agents:simple-agent, label=aJenkinsLabel-116-t03dc, memory=4050, name=aJenkinsLabel-116-t03dc, overrides=[cloud, cpu, image, memory, portMappings, label], portMappings=[PortMappingEntry{containerPort=22, hostPort=0, protocol='tcp}], privileged=false, uniqueRemoteFSRoot=false}
Sep 8 02:24:09 jenkins jenkins[86776]: 2022-09-08 02:24:09.161+0000 [id=10094300]#011INFO#011c.c.j.p.a.p.ECSTaskTemplateStepExecution#start: Registering task template with name aJenkinsLabel-116-t03dc-7xc42
Sep 8 02:24:09 jenkins jenkins[86776]: 2022-09-08 02:24:09.197+0000 [id=10094300]#011INFO#011c.c.j.p.amazonecs.ECSService#findTaskDefinition: No existing task definition found for family or ARN: aJenkinsJob-116-t03dc-7xc42
Sep 8 02:24:09 jenkins jenkins[86776]: 2022-09-08 02:24:09.268+0000 [id=10094300]#011INFO#011c.c.j.p.amazonecs.ECSService#registerTemplate: Created Task Definition: {TaskDefinitionArn: arn:aws:ecs:us-east-1:1111111:task-definition/aJenkinsJob-116-t03dc-7xc42:1,ContainerDefinitions: [{Name: aJenkinsJob-116-t03dc-7xc42,Image: 1111111.dkr.ecr.us-east-1.amazonaws.com/jenkins-agents:simple-agent,Cpu: 1012,Memory: 4050,Links: [],PortMappings: [{ContainerPort: 22,HostPort: 0,Protocol: tcp}],Essential: true,EntryPoint: [],Command: [],Environment: [],EnvironmentFiles: [],MountPoints: [],VolumesFrom: [],Secrets: [],DependsOn: [],Privileged: false,DnsServers: [],DnsSearchDomains: [],ExtraHosts: [],DockerSecurityOptions: [],Ulimits: [],SystemControls: [],ResourceRequirements: [],}],Family: aJenkinsJob-116-t03dc-7xc42,Revision: 1,Volumes: [],Status: ACTIVE,RequiresAttributes: [{Name: com.amazonaws.ecs.capability.ecr-auth,}],PlacementConstraints: [],Compatibilities: [EXTERNAL, EC2],RequiresCompatibilities: [],InferenceAccelerators: [],RegisteredAt: Thu Sep 08 02:24:09 UTC 2022,RegisteredBy: arn:aws:sts::1111111:assumed-role/jenkins/i-0,}
Sep 8 02:24:09 jenkins jenkins[86776]: 2022-09-08 02:24:09.268+0000 [id=10094300]#011INFO#011c.c.j.plugins.amazonecs.ECSCloud#addDynamicTemplate: Task definition created or found: ARN: arn:aws:ecs:us-east-1:1111111:task-definition/aJenkinsJob-116-t03dc-7xc42:1
Sep 8 02:24:15 jenkins jenkins[86776]: 2022-09-08 02:24:15.984+0000 [id=147]#011INFO#011c.c.j.plugins.amazonecs.ECSCloud#provision: Asked to provision 1 agent(s) for: aJenkinsLabel-116-t03dc
Sep 8 02:24:15 jenkins jenkins[86776]: 2022-09-08 02:24:15.985+0000 [id=147]#011INFO#011c.c.j.plugins.amazonecs.ECSCloud#provision: Will provision aJenkinsJob-116-t03dc-th5x5, for label: aJenkinsLabel-116-t03dc
Sep 8 02:24:26 jenkins jenkins[86776]: 2022-09-08 02:24:26.074+0000 [id=139]#011INFO#011hudson.slaves.NodeProvisioner#update: aJenkinsJob-116-t03dc-th5x5 provisioning successfully completed. We have now 75 computer(s)
everything is normal so far, note "Asked to provision 1 agent(s)"
Sep 8 02:24:26 jenkins jenkins[86776]: 2022-09-08 02:24:26.095+0000 [id=10094155]#011INFO#011c.c.j.p.amazonecs.ECSLauncher#runECSTask: [aJenkinsJob-116-t03dc-th5x5]: Starting agent with task definition arn:aws:ecs:us-east-1:1111111:task-definition/aJenkinsJob-116-t03dc-7xc42:1}
Sep 8 02:24:26 jenkins jenkins[86776]: 2022-09-08 02:24:26.199+0000 [id=10094155]#011INFO#011c.c.j.p.amazonecs.ECSLauncher#runECSTask: [aJenkinsJob-116-t03dc-th5x5]: Agent started with task arn : arn:aws:ecs:us-east-1:1111111:task/ECSClusterName/45bc
Sep 8 02:24:26 jenkins jenkins[86776]: 2022-09-08 02:24:26.199+0000 [id=10094155]#011INFO#011c.c.j.p.amazonecs.ECSLauncher#launchECSTask: [aJenkinsJob-116-t03dc-th5x5]: TaskArn: arn:aws:ecs:us-east-1:1111111:task/ECSClusterName/45bc
Sep 8 02:24:26 jenkins jenkins[86776]: 2022-09-08 02:24:26.199+0000 [id=10094155]#011INFO#011c.c.j.p.amazonecs.ECSLauncher#launchECSTask: [aJenkinsJob-116-t03dc-th5x5]: TaskDefinitionArn: arn:aws:ecs:us-east-1:1111111:task-definition/aJenkinsJob-116-t03dc-7xc42:1
Sep 8 02:24:26 jenkins jenkins[86776]: 2022-09-08 02:24:26.199+0000 [id=10094155]#011INFO#011c.c.j.p.amazonecs.ECSLauncher#launchECSTask: [aJenkinsJob-116-t03dc-th5x5]: ClusterArn: arn:aws:ecs:us-east-1:1111111:cluster/ECSClusterName
Sep 8 02:24:26 jenkins jenkins[86776]: 2022-09-08 02:24:26.199+0000 [id=10094155]#011INFO#011c.c.j.p.amazonecs.ECSLauncher#launchECSTask: [aJenkinsJob-116-t03dc-th5x5]: ContainerInstanceArn: arn:aws:ecs:us-east-1:1111111:container-instance/ECSClusterName/7f6c
here we've started a task `45bc` for the label ending in `th5x5`
Sep 8 02:24:26 jenkins jenkins[86776]: 2022-09-08 02:24:26.228+0000 [id=10094150]#011INFO#011c.c.j.p.amazonecs.ECSLauncher#runECSTask: [aJenkinsJob-116-t03dc-th5x5]: Starting agent with task definition arn:aws:ecs:us-east-1:1111111:task-definition/aJenkinsJob-116-t03dc-7xc42:1}
Sep 8 02:24:26 jenkins jenkins[86776]: 2022-09-08 02:24:26.349+0000 [id=10094150]#011INFO#011c.c.j.p.amazonecs.ECSLauncher#runECSTask: [aJenkinsJob-116-t03dc-th5x5]: Agent started with task arn : arn:aws:ecs:us-east-1:1111111:task/ECSClusterName/f42e
Sep 8 02:24:26 jenkins jenkins[86776]: 2022-09-08 02:24:26.350+0000 [id=10094150]#011INFO#011c.c.j.p.amazonecs.ECSLauncher#launchECSTask: [aJenkinsJob-116-t03dc-th5x5]: TaskArn: arn:aws:ecs:us-east-1:1111111:task/ECSClusterName/f42e
Sep 8 02:24:26 jenkins jenkins[86776]: 2022-09-08 02:24:26.350+0000 [id=10094150]#011INFO#011c.c.j.p.amazonecs.ECSLauncher#launchECSTask: [aJenkinsJob-116-t03dc-th5x5]: TaskDefinitionArn: arn:aws:ecs:us-east-1:1111111:task-definition/aJenkinsJob-116-t03dc-7xc42:1
Sep 8 02:24:26 jenkins jenkins[86776]: 2022-09-08 02:24:26.350+0000 [id=10094150]#011INFO#011c.c.j.p.amazonecs.ECSLauncher#launchECSTask: [aJenkinsJob-116-t03dc-th5x5]: ClusterArn: arn:aws:ecs:us-east-1:1111111:cluster/ECSClusterName
Sep 8 02:24:26 jenkins jenkins[86776]: 2022-09-08 02:24:26.350+0000 [id=10094150]#011INFO#011c.c.j.p.amazonecs.ECSLauncher#launchECSTask: [aJenkinsJob-116-t03dc-th5x5]: ContainerInstanceArn: arn:aws:ecs:us-east-1:1111111:container-instance/ECSClusterName/36c5
uh oh... just a fraction of a second later we've spun up another task, `f42e`, for the same label `th5x5`
what we've actually done is overwrite the task `45bc` with `f42e`, and now when `45bc` attempts to connect it will forever be rejected by Jenkins.
This PR adds
synchronize
to the launch method to prevent multiple threads simultaneously spinning up the same computer.In a worse case scenario, simultaneous calls to launch will cause containers to spin up in ECS that will attempt to connect to Jenkins forever, but will fail with with the following traceback
If we take a look at the logs from Jenkins, we get a better picture of what's going on.
An identical commit has been made to the kubernetes plugin, it seems that they have suffered from similar issues in the past. https://github.com/jenkinsci/kubernetes-plugin/commit/3e0fd5045caa97962ac499e8f0a87776a26ff0c8