jenkinsci / google-compute-engine-plugin

https://plugins.jenkins.io/google-compute-engine/
Apache License 2.0
57 stars 85 forks source link

Jenkins slaves connection fails randomly #299

Open mikiedelstein opened 2 years ago

mikiedelstein commented 2 years ago

Jenkins and plugins versions report

Environment Jenkins: 2.327 OS: Linux - 5.4.0-1058-gcp --- CustomHistory:1.6 ace-editor:1.1 ansicolor:1.0.1 ant:1.13 antisamy-markup-formatter:2.6 apache-httpcomponents-client-4-api:4.5.13-1.0 artifactory:3.14.2 authentication-tokens:1.4 authorize-project:1.4.0 aws-credentials:1.33 aws-java-sdk-ec2:1.12.131-302.vbef9650c6521 aws-java-sdk-minimal:1.12.131-302.vbef9650c6521 badge:1.9 bitbucket:214.v2fd4234d0554 bitbucket-push-and-pull-request:2.8.1 bootstrap4-api:4.6.0-3 bootstrap5-api:5.1.3-4 bouncycastle-api:2.25 branch-api:2.7.0 build-environment:1.7 build-monitor-plugin:1.13+build.202112271752 build-name-setter:2.2.0 build-pipeline-plugin:1.5.8 build-timeout:1.20 build-timestamp:1.0.3 build-user-vars-plugin:1.8 caffeine-api:2.9.2-29.v717aac953ff3 checks-api:1.7.2 chucknorris:1.4 cloudbees-folder:6.17 command-launcher:1.6 conditional-buildstep:1.4.1 config-file-provider:3.8.2 copyartifact:1.46.2 credentials:1055.v1346ba467ba1 credentials-binding:1.27 dashboard-view:2.18 display-url-api:2.3.5 docker-build-step:2.8 docker-commons:1.17 docker-java-api:3.1.5.2 docker-plugin:1.2.6 docker-workflow:1.26 durable-task:493.v195aefbb0ff2 echarts-api:5.2.2-2 email-ext:2.86 embeddable-build-status:2.0.3 extended-choice-parameter:0.82 extensible-choice-parameter:1.8.0 external-monitor-job:1.7 extra-columns:1.25 font-awesome-api:5.15.4-5 gcloud-sdk:0.0.3 generic-webhook-trigger:1.79 git:4.10.1 git-client:3.11.0 git-parameter:0.9.14 git-server:1.10 git-tag-message:1.7.1 github:1.34.1 github-api:1.301-378.v9807bd746da5 github-branch-source:2.11.4 google-compute-engine:4.3.8 google-container-registry-auth:0.3 google-kubernetes-engine:0.8.6 google-login:1.6 google-oauth-plugin:1.0.6 gradle:1.37.1 groovy-postbuild:2.5 handlebars:3.0.8 hidden-parameter:0.0.4 htmlpublisher:1.28 ivy:2.1 jackson2-api:2.13.1-244.v773c36c5b330 javadoc:1.6 jaxb:2.3.0 jdk-tool:1.5 jjwt-api:0.11.2-9.c8b45b8bb173 job-dsl:1.78.3 jobConfigHistory:2.31-rc1098.b666422863b2 jquery:1.12.4-1 jquery-detached:1.2.1 jquery-ui:1.0.2 jquery3-api:3.6.0-2 jsch:0.1.55.2 junit:1.53 kubernetes:1.31.1 kubernetes-client-api:5.10.1-171.vaa0774fb8c20 kubernetes-credentials:0.9.0 ldap:2.7 lockable-resources:2.13 mailer:1.34 matrix-auth:3.0 matrix-project:1.19 maven-plugin:3.16 mercurial:2.16 metrics:4.0.2.8 momentjs:1.1.1 nodejs:1.4.3 oauth-credentials:0.5 okhttp-api:4.9.3-105.vb96869f8ac3a pam-auth:1.6.1 parameter-separator:1.3 parameterized-trigger:2.43 periodicbackup:1.7 pipeline-build-step:2.15 pipeline-github-lib:1.0 pipeline-graph-analysis:188.v3a01e7973f2c pipeline-input-step:427.va6441fa17010 pipeline-milestone-step:1.3.2 pipeline-model-api:1.9.3 pipeline-model-declarative-agent:1.1.1 pipeline-model-definition:1.9.3 pipeline-model-extensions:1.9.3 pipeline-rest-api:2.20 pipeline-stage-step:291.vf0a8a7aeeb50 pipeline-stage-tags-metadata:1.9.3 pipeline-stage-view:2.20 pipeline-utility-steps:2.11.0 plain-credentials:1.7 plugin-util-api:2.10.0 popper-api:1.16.1-2 popper2-api:2.11.0-1 publish-over:0.22 publish-over-ssh:1.22 purge-job-history:1.6 pwauth:0.4 readonly-parameters:1.0.0 rebuild:1.32 resource-disposer:0.17 role-strategy:3.2.0 run-condition:1.5 saml:2.0.9 scm-api:2.6.5 script-security:1118.vba21ca2e3286 slack:2.49 snakeyaml-api:1.29.1 ssh:2.6.1 ssh-agent:1.23 ssh-credentials:1.19 ssh-slaves:1.33.0 sshd:3.1.0 stashNotifier:1.24 structs:308.v852b473a2b8c summary_report:1.15 throttle-concurrents:2.6 timestamper:1.15 token-macro:267.vcdaea6462991 trilead-api:1.0.13 uno-choice:2.5.7 variant:1.4 view-job-filters:2.3 windows-slaves:1.8 workflow-aggregator:2.6 workflow-api:1108.v57edf648f5d4 workflow-basic-steps:2.24 workflow-cps:2648.va9433432b33c workflow-cps-global-lib:552.vd9cc05b8a2e1 workflow-durable-task-step:1112.vda00e6febcc1 workflow-job:1145.v7f2433caa07f workflow-multibranch:696.v52535c46f4c9 workflow-scm-step:2.13 workflow-step-api:615.vb09dac339255 workflow-support:804.vba10a18a1476 ws-cleanup:0.40

What Operating System are you using (both controller, and any agents involved in the problem)?

Jenkins randomly deletes slaves before completing job runs. We could not establish any sort of pattern for when it happens, it is not time dependent as far as I can tell.

We get the following errors:

Caused: java.io.IOException: Unexpected termination of the channel

Caused: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@60d32f79:jenkins-jgqlcq": Remote call on jenkins-jgqlcq failed. The channel is closing down or has closed down

I can see in GCP Log Explorer that Jenkins sends a v1.compute.instances.delete to the node when this happens, however, I cannot find any definition for it.

I set a retention time to 500 just to see that it is not the issue, launch timeout is 300.

Reproduction steps

  1. GCP is spinning up a slave instance to run
  2. randomly it will get (or not) a v1.compute.instances.delete from Jenkins.

Expected Results

All slaves should finish their runs without disconnecting

Actual Results

Slaves randomly disconnect without any visible pattern

Anything else?

No response

BrianRossmajer commented 2 years ago

I recently had this same issue come up and am just starting investigating. It coincided with starting to use Jenkins' configuration as code plugin; I'm just curious if you're using that too.

BrianRossmajer commented 2 years ago

I'll just comment here in case it helps someone in the future... it did indeed seem related to using Jenkins' Configuration As Code. I had exported the configuration and edited it then duplicated it for something else, not noticing that it kept instanceId for the cloud configuration, so that two different cloud configurations were using the same instanceId. Note that the example yaml does not include the instanceId... once I removed it, the random shutdowns stopped.