jenkinsci / amazon-ecs-plugin

Amazon EC2 Container Service Plugin for Jenkins
https://plugins.jenkins.io/amazon-ecs
MIT License
192 stars 226 forks source link

Agents on ECS don't always get removed when done. #295

Open jonbrohauge opened 1 year ago

jonbrohauge commented 1 year ago

Jenkins and plugins versions report

Environment ```text Jenkins: 2.378 OS: Linux - 5.4.117-58.216.amzn2.x86_64 --- Office-365-Connector:4.17.0 ace-editor:1.1 active-directory:2.27 amazon-ecr:1.107.ve50d37906739 amazon-ecs:1.46 analysis-model-api:10.20.0 ansicolor:1.0.2 ant:481.v7b_09e538fcca antisamy-markup-formatter:155.v795fb_8702324 anything-goes-formatter:1.0 apache-httpcomponents-client-4-api:4.5.13-138.v4e7d9a_7b_a_e61 artifactory:3.17.3 authentication-tokens:1.4 authorize-project:1.4.0 aws-credentials:191.vcb_f183ce58b_9 aws-java-sdk:1.12.287-357.vf82d85a_6eefd aws-java-sdk-cloudformation:1.12.287-357.vf82d85a_6eefd aws-java-sdk-codebuild:1.12.287-357.vf82d85a_6eefd aws-java-sdk-ec2:1.12.287-357.vf82d85a_6eefd aws-java-sdk-ecr:1.12.287-357.vf82d85a_6eefd aws-java-sdk-ecs:1.12.287-357.vf82d85a_6eefd aws-java-sdk-efs:1.12.287-357.vf82d85a_6eefd aws-java-sdk-elasticbeanstalk:1.12.287-357.vf82d85a_6eefd aws-java-sdk-iam:1.12.287-357.vf82d85a_6eefd aws-java-sdk-logs:1.12.287-357.vf82d85a_6eefd aws-java-sdk-minimal:1.12.287-357.vf82d85a_6eefd aws-java-sdk-sns:1.12.287-357.vf82d85a_6eefd aws-java-sdk-sqs:1.12.287-357.vf82d85a_6eefd aws-java-sdk-ssm:1.12.287-357.vf82d85a_6eefd azure-ad:267.v5b_dfb_514d9fd azure-sdk:118.v43f74dd9ca_dc badge:1.9.1 basic-branch-build-strategies:71.vc1421f89888e blueocean:1.25.8 blueocean-autofavorite:1.2.5 blueocean-bitbucket-pipeline:1.25.8 blueocean-commons:1.25.8 blueocean-config:1.25.8 blueocean-core-js:1.25.8 blueocean-dashboard:1.25.8 blueocean-display-url:2.4.1 blueocean-events:1.25.8 blueocean-git-pipeline:1.25.8 blueocean-github-pipeline:1.25.8 blueocean-i18n:1.25.8 blueocean-jira:1.25.8 blueocean-jwt:1.25.8 blueocean-personalization:1.25.8 blueocean-pipeline-api-impl:1.25.8 blueocean-pipeline-editor:1.25.8 blueocean-pipeline-scm-api:1.25.8 blueocean-rest:1.25.8 blueocean-rest-impl:1.25.8 blueocean-web:1.25.8 bootstrap5-api:5.2.1-3 bouncycastle-api:2.26 branch-api:2.1051.v9985666b_f6cc build-monitor-plugin:1.13+build.202205140447 build-timeout:1.24 build-user-vars-plugin:1.9 caffeine-api:2.9.3-65.v6a_47d0f4d1fe checks-api:1.8.0 cloudbees-bitbucket-branch-source:791.vb_eea_a_476405b cloudbees-folder:6.795.v3e23d3c6f194 command-launcher:90.v669d7ccb_7c31 commons-lang3-api:3.12.0-36.vd97de6465d5b_ commons-text-api:1.10.0-27.vb_fa_3896786a_7 config-file-provider:3.11.1 configuration-as-code:1569.vb_72405b_80249 cors-filter:1.1 credentials:1214.v1de940103927 credentials-binding:523.vd859a_4b_122e6 cucumber-reports:5.7.4 data-tables-api:1.12.1-4 dependency-track:4.2.0 display-url-api:2.3.6 docker-commons:1.21 docker-workflow:528.v7c193a_0b_e67c durable-task:501.ve5d4fc08b0be echarts-api:5.4.0-1 email-ext:2.92 external-monitor-job:203.v683c09d993b_9 favorite:2.4.1 flyway-runner:1.9 font-awesome-api:6.2.0-3 forensics-api:1.16.0 git:4.13.0 git-client:3.13.0 git-server:99.va_0826a_b_cdfa_d github:1.36.0 github-api:1.303-400.v35c2d8258028 github-branch-source:1696.v3a_7603564d04 gradle:2.1.1 groovy-postbuild:2.5 h2-api:1.4.199 handlebars:3.0.8 handy-uri-templates-2-api:2.1.8-22.v77d5b_75e6953 hashicorp-vault-plugin:359.v2da_3b_45f17d5 htmlpublisher:1.31 instance-identity:116.vf8f487400980 ionicons-api:31.v4757b_6987003 ivy:2.2 jackson2-api:2.13.4.20221013-295.v8e29ea_354141 jacoco:3.3.2 jakarta-activation-api:2.0.1-2 jakarta-mail-api:2.0.1-2 javadoc:226.v71211feb_e7e9 javax-activation-api:1.2.0-5 javax-mail-api:1.6.2-8 jaxb:2.3.7-1 jdk-tool:63.v62d2fd4b_4793 jenkins-design-language:1.25.8 jersey2-api:2.37-1 jira:3.8 jjwt-api:0.11.5-77.v646c772fddb_0 jnr-posix-api:3.1.15-2 job-dsl:1.81 jobcacher:301.v06a_c88b_0fa_f8 jquery:1.12.4-1 jquery-detached:1.2.1 jquery3-api:3.6.1-2 jsch:0.1.55.61.va_e9ee26616e7 junit:1160.vf1f01a_a_ea_b_7f junit-attachments:101.v82f494a_00e9e kubernetes-cli:1.10.3 kubernetes-client-api:5.12.2-193.v26a_6078f65a_9 kubernetes-credentials:0.9.0 leapwork:4.0.7 lockable-resources:2.18 logstash:2.5.0205.vd05825ed46bd mailer:438.v02c7f0a_12fa_4 mapdb-api:1.0.9-28.vf251ce40855d matrix-auth:3.1.5 matrix-project:785.v06b_7f47b_c631 maven-plugin:3.20 mercurial:1260.vdfb_723cdcc81 metrics:4.2.10-405.v60a_9cc74e923 mina-sshd-api-common:2.9.1-44.v476733c11f82 mina-sshd-api-core:2.9.1-44.v476733c11f82 momentjs:1.1.1 nodejs:1.5.1 okhttp-api:4.9.3-108.v0feda04578cf pam-auth:1.10 parameterized-scheduler:1.1 performance:3.20 pipeline-build-step:2.18 pipeline-github-lib:38.v445716ea_edda_ pipeline-graph-analysis:195.v5812d95a_a_2f9 pipeline-groovy-lib:613.v9c41a_160233f pipeline-input-step:456.vd8a_957db_5b_e9 pipeline-maven:1226.v833b_d9f526b_9 pipeline-milestone-step:101.vd572fef9d926 pipeline-model-api:2.2118.v31fd5b_9944b_5 pipeline-model-definition:2.2118.v31fd5b_9944b_5 pipeline-model-extensions:2.2118.v31fd5b_9944b_5 pipeline-rest-api:2.27 pipeline-stage-step:296.v5f6908f017a_5 pipeline-stage-tags-metadata:2.2118.v31fd5b_9944b_5 pipeline-stage-view:2.27 pipeline-utility-steps:2.13.2 plain-credentials:139.ved2b_9cf7587b plugin-util-api:2.18.0 popper2-api:2.11.6-2 powershell:1.7 prism-api:1.29.0-1 pubsub-light:1.17 pyenv-pipeline:2.1.2 resource-disposer:0.20 run-condition:1.5 scm-api:621.vda_a_b_055e58f7 scmskip:1.0.3 script-security:1190.v65867a_a_47126 sidebar-link:2.2.0 snakeyaml-api:1.33-90.v80dcb_3814d35 sonar:2.14 sse-gateway:1.26 ssh-credentials:305.v8f4381501156 ssh-slaves:2.854.v7fd446b_337c9 sshd:3.249.v2dc2ea_416e33 startup-trigger-plugin:2.9.3 structs:324.va_f5d6774f3a_d timestamper:1.21 token-macro:308.v4f2b_ed62b_b_16 trilead-api:2.72.v2a_3236754f73 variant:59.vf075fe829ccb warnings-ng:9.20.1 windows-slaves:1.8.1 workflow-aggregator:590.v6a_d052e5a_a_b_5 workflow-api:1200.v8005c684b_a_c6 workflow-basic-steps:994.vd57e3ca_46d24 workflow-cps:3536.vb_8a_6628079d5 workflow-durable-task-step:1210.va_1e5d77e122b workflow-job:1254.v3f64639b_11dd workflow-multibranch:716.vc692a_e52371b_ workflow-scm-step:400.v6b_89a_1317c9a_ workflow-step-api:639.v6eca_cd8c04a_a_ workflow-support:839.v35e2736cfd5c ws-cleanup:0.43 ```

What Operating System are you using (both controller, and any agents involved in the problem)?

The Jenkins controller is based off of latest the official Docker Image jenkins/jenkins:2.378 The agents are based off of the latest official Docker Image: jenkins/inbound-agent:3071.v7e9b_0dc08466-1

Due to our Enterprise Architecture, we have slightly modified the actual Images our container are based on. Custom certificates, in-house artifact server, and such stuff.

Reproduction steps

  1. Start a multibranch pipeline job
  2. Wait until it is done.
  3. Verify node has been removed from Jenkins
  4. Verify node still exists in AWS ECS having status RUNNING
  5. Look in the Jenkins log

Expected Results

Log output:

Nov 17, 2022 9:07:06 AM INFO hudson.slaves.CloudRetentionStrategy check Disconnecting ecs-ecs2-7999w Nov 17, 2022 9:07:06 AM INFO com.cloudbees.jenkins.plugins.amazonecs.ECSSlave _terminate [ecs-ecs2-7999w]: Stopping: TaskArn arn:aws:ecs:eu-west-1:912774568938:task/CI-build-agents/e4d3a260d79d45e2b155a5ebe5e28d06, ClusterArn arn:aws:ecs:eu-west-1:912774568938:cluster/CI-build-agents

...

Computer.threadPoolForRemoting [#19] for ecs-ecs2-7999w terminated: java.nio.channels.ClosedChannelException

Actual Results

Log output:

Disconnecting ecs-ecs8-cb764

Nov 17, 2022 9:12:49 AM INFO com.cloudbees.jenkins.plugins.amazonecs.ECSSlave _terminate [ecs-ecs8-cb764]: Stopping: TaskArn ecs-ecs8-cb764, ClusterArn arn:aws:::cluster/ Nov 17, 2022 9:12:49 AM INFO com.cloudbees.jenkins.plugins.amazonecs.ECSService stopTask Delete ECS agent task: ecs-ecs8-cb764 Nov 17, 2022 9:12:49 AM SEVERE com.cloudbees.jenkins.plugins.amazonecs.ECSService stopTask Couldn't stop task arn ecs-ecs8-cb764 caught exception: taskId length should be one of [32,36] (Service: AmazonECS; Status Code: 400; Error Code: InvalidParameterException; Request ID: 9445cd94-93bd-4d43-b94f-7be867fe4f2b; Proxy: null) com.amazonaws.services.ecs.model.InvalidParameterException: taskId length should be one of [32,36] (Service: AmazonECS; Status Code: 400; Error Code: InvalidParameterException; Request ID: 9445cd94-93bd-4d43-b94f-7be867fe4f2b; Proxy: null) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1879) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1418) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1387) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1157) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:814) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:781) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:755) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:715) at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:697) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:561) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:541) at com.amazonaws.services.ecs.AmazonECSClient.doInvoke(AmazonECSClient.java:4691) at com.amazonaws.services.ecs.AmazonECSClient.invoke(AmazonECSClient.java:4658) at com.amazonaws.services.ecs.AmazonECSClient.invoke(AmazonECSClient.java:4647) at com.amazonaws.services.ecs.AmazonECSClient.executeStopTask(AmazonECSClient.java:3447) at com.amazonaws.services.ecs.AmazonECSClient.stopTask(AmazonECSClient.java:3416) at com.cloudbees.jenkins.plugins.amazonecs.ECSService.stopTask(ECSService.java:164) at com.cloudbees.jenkins.plugins.amazonecs.ECSSlave._terminate(ECSSlave.java:146) at hudson.slaves.AbstractCloudSlave.terminate(AbstractCloudSlave.java:88) at hudson.slaves.CloudRetentionStrategy.check(CloudRetentionStrategy.java:61) at hudson.slaves.CloudRetentionStrategy.check(CloudRetentionStrategy.java:45) at hudson.slaves.ComputerRetentionWork$1.run(ComputerRetentionWork.java:71) at hudson.model.Queue._withLock(Queue.java:1396) at hudson.model.Queue.withLock(Queue.java:1270) at hudson.slaves.ComputerRetentionWork.doRun(ComputerRetentionWork.java:62) at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:94) at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:69) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829)

Nov 17, 2022 9:12:49 AM INFO jenkins.slaves.DefaultJnlpSlaveReceiver channelClosed Computer.threadPoolForRemoting [#99] for ecs-ecs8-cb764 terminated: java.nio.channels.ClosedChannelException

...

[JNLP4-connect connection from /:] Refusing headers from remote: Unknown client name: ecs-ecs8-cb764

...

Anything else?

This does not happen every time. Although it appears very often after upgrading to use plugin version 1.46. We could not replicate the issue with version 1.41.