jenkinsci / hetzner-cloud-plugin

Hetzner cloud integration for Jenkins
https://plugins.jenkins.io/hetzner-cloud/
Apache License 2.0
23 stars 8 forks source link

Jenkins does not cleanup Hetzner nodes with shutdown policy respecting billing hours after a Jenkins restart #36

Closed sandrinr closed 2 years ago

sandrinr commented 2 years ago

Jenkins and plugins versions report

Environment ```text Jenkins: 2.346.1 OS: Linux - 5.4.0-66-generic --- ace-editor:1.1 analysis-model-api:10.12.0 ansicolor:1.0.1 ant:475.vf34069fef73c antisamy-markup-formatter:2.7 apache-httpcomponents-client-4-api:4.5.13-1.0 authentication-tokens:1.4 blueocean:1.25.5 blueocean-autofavorite:1.2.5 blueocean-bitbucket-pipeline:1.25.5 blueocean-commons:1.25.5 blueocean-config:1.25.5 blueocean-core-js:1.25.5 blueocean-dashboard:1.25.5 blueocean-display-url:2.4.1 blueocean-events:1.25.5 blueocean-git-pipeline:1.25.5 blueocean-github-pipeline:1.25.5 blueocean-i18n:1.25.5 blueocean-jira:1.25.5 blueocean-jwt:1.25.5 blueocean-personalization:1.25.5 blueocean-pipeline-api-impl:1.25.5 blueocean-pipeline-editor:1.25.5 blueocean-pipeline-scm-api:1.25.5 blueocean-rest:1.25.5 blueocean-rest-impl:1.25.5 blueocean-web:1.25.5 bootstrap4-api:4.6.0-5 bootstrap5-api:5.1.3-7 bouncycastle-api:2.26 branch-api:2.1046.v0ca_37783ecc5 build-blocker-plugin:1.7.8 build-monitor-plugin:1.13+build.202205140447 build-timeout:1.21 caffeine-api:2.9.3-65.v6a_47d0f4d1fe categorized-view:1.12 checks-api:1.7.4 claim:2.18.2 cloud-stats:0.27 cloudbees-bitbucket-branch-source:773.v4b_9b_005b_562b_ cloudbees-folder:6.729.v2b_9d1a_74d673 cobertura:1.17 code-coverage-api:3.0.0 command-launcher:84.v4a_97f2027398 conditional-buildstep:1.4.2 config-file-provider:3.10.0 copyartifact:1.46.4 credentials:1129.vef26f5df883c credentials-binding:523.vd859a_4b_122e6 dark-theme:185.v276b_5a_8966a_e dashboard-view:2.432.va_712ce35862d data-tables-api:1.12.1-2 display-url-api:2.3.6 docker-commons:1.19 docker-compose-build-step:1.0 docker-custom-build-environment:1.7.3 docker-java-api:3.2.13-37.vf3411c9828b9 docker-plugin:1.2.9 docker-slaves:1.0.7 docker-workflow:1.29 downstream-build-cache:1.7 durable-task:496.va67c6f9eefa7 echarts-api:5.3.3-1 email-ext:2.89 envinject-api:1.199.v3ce31253ed13 extended-read-permission:3.2 external-monitor-job:191.v363d0d1efdf8 extra-columns:1.25 favorite:2.4.1 font-awesome-api:6.1.1-1 forensics-api:1.15.1 git:4.11.3 git-client:3.11.0 git-server:1.11 github:1.34.4 github-api:1.303-400.v35c2d8258028 github-branch-source:1637.vd833b_7ca_7654 gitlab-api:5.0.1-74.v44f46b_54c775 gitlab-branch-source:628.ve99e3d4df4b_8 gitlab-plugin:1.5.34 gradle:1.39.1 greenballs:1.15.1 handlebars:3.0.8 handy-uri-templates-2-api:2.1.8-22.v77d5b_75e6953 heavy-job:1.1 hetzner-cloud:47.v88af64711112 htmlpublisher:1.30 jackson2-api:2.13.3-285.vc03c0256d517 jacoco:3.3.2 javadoc:217.v905b_86277a_2a_ javax-activation-api:1.2.0-3 javax-mail-api:1.6.2-6 jaxb:2.3.6-1 jdk-tool:1.5 jenkins-design-language:1.25.5 jersey2-api:2.36-2 jira:3.7.1 jjwt-api:0.11.5-77.v646c772fddb_0 jnr-posix-api:3.1.7-3 job-import-plugin:3.5 jobConfigHistory:1148.v8607da_ef251e jquery:1.12.4-1 jquery-detached:1.2.1 jquery3-api:3.6.0-4 jsch:0.1.55.2 junit:1119.1121.vc43d0fc45561 ldap:2.10 lockable-resources:2.15 mailer:414.vcc4c33714601 mapdb-api:1.0.9.0 matrix-auth:3.1.3 matrix-project:771.v574584b_39e60 matrix-reloaded:1.1.3 maven-plugin:3.19 metrics:4.1.6.2 mina-sshd-api-common:2.8.0-21.v493b_6b_db_22c6 mina-sshd-api-core:2.8.0-21.v493b_6b_db_22c6 momentjs:1.1.1 naginator:1.18.1 nodelabelparameter:1.11.0 okhttp-api:4.9.3-105.vb96869f8ac3a pam-auth:1.8 parameterized-scheduler:1.0 parameterized-trigger:2.44 pipeline-build-step:2.18 pipeline-graph-analysis:195.v5812d95a_a_2f9 pipeline-groovy-lib:593.va_a_fc25d520e9 pipeline-input-step:449.v77f0e8b_845c4 pipeline-milestone-step:101.vd572fef9d926 pipeline-model-api:2.2097.v33db_b_de764b_e pipeline-model-definition:2.2097.v33db_b_de764b_e pipeline-model-extensions:2.2097.v33db_b_de764b_e pipeline-stage-step:293.v200037eefcd5 pipeline-stage-tags-metadata:2.2097.v33db_b_de764b_e pipeline-utility-steps:2.13.0 plain-credentials:1.8 plot:2.1.10 plugin-util-api:2.17.0 popper-api:1.16.1-3 popper2-api:2.11.5-2 postbuildscript:3.1.0-375.v3db_cd92485e1 prism-api:1.28.0-2 promoted-builds:876.v99d29788b_36b_ publish-over:0.22 pubsub-light:1.16 python:1.3 resource-disposer:0.19 run-condition:1.5 scm-api:608.vfa_f971c5a_a_e9 script-security:1175.v4b_d517d6db_f0 scriptler:3.5 slack:616.v03b_1e98d13dd snakeyaml-api:1.30.1 sse-gateway:1.25 ssh-agent:295.v9ca_a_1c7cc3a_a_ ssh-credentials:277.v95c2fec1c047 ssh-slaves:1.821.vd834f8a_c390e sshd:3.242.va_db_9da_b_26a_c3 structs:318.va_f3ccb_729b_71 subversion:2.15.5 text-finder:1.19 theme-manager:1.4 thinBackup:1.10 throttle-concurrents:2.8 timestamper:1.18 token-macro:293.v283932a_0a_b_49 translation:1.16 trilead-api:1.57.v6e90e07157e1 variant:1.4 view-job-filters:2.3 warnings-ng:9.13.0 windows-slaves:1.8.1 workflow-api:1164.v760c223ddb_32 workflow-basic-steps:948.v2c72a_091b_b_68 workflow-cps:2725.v7b_c717eb_12ce workflow-cps-global-lib:588.v576c103a_ff86 workflow-durable-task-step:1146.v1a_d2e603f929 workflow-job:1189.va_d37a_e9e4eda_ workflow-multibranch:716.vc692a_e52371b_ workflow-scm-step:400.v6b_89a_1317c9a_ workflow-step-api:625.vd896b_f445a_f8 workflow-support:820.vd1a_6cc65ef33 ws-cleanup:0.42 yet-another-build-visualizer:1.16 ```

What Operating System are you using (both controller, and any agents involved in the problem)?

Controller: Ubuntu Agents: Ubuntu

Reproduction steps

  1. In "Manage Clouds" configure your Hetzner node's shutdown policy to "Remove idle server just before billing cycle hour completes"
  2. Start some jobs that spawn the configured Hetzner nodes
  3. Restart Jenkins

Expected Results

The unused nodes are removed once their billing cycle ends.

Actual Results

The unused Nodes to not seem to be removed anymore.

Greenshot 2022-06-30 08 15 37

The nodes in the picture were idle over extended periods of time. Jenkins made use of them when jobs were started, i.e. it did not spawn additional nodes.

We had to manually delete the nodes in Jenkins. With that Jenkins, deleted the Hetzner servers by itself (no manual intervention needed on the Hetzner side).

Anything else?

Maybe the problem is not only rooted in the restart of Jenkins. When restarting, we also upgraded Jenkins to the current LTS 2.346.1 from the version 2.340. Also, we upgraded all installed plugins.

We never had such an issue before (also not when restarting Jenkins) until now when we recently, after the completion of #30, changed the shutdown policy of our Jenkins-Hetzner nodes.

rkosegi commented 2 years ago

Hi, thank you for a bug report.

I think I know what the issue is. Datastructure that has details about Hetzner server (such as creation timestamp) is intentionally not serialized. That means after Jenkins is restarted, safety NULL check is always false, so node is never removed.

Will look into it

rkosegi commented 2 years ago

@sandrinr disregard my previous comment - it's not accurate. Hetzner nodes are EphemeralNodes, which means they are not persisted. We are missing mechanism to cleanup existing nodes during shutdown.

sandrinr commented 2 years ago

@rkosegi Thanks for the fix 🙏 !

So this had nothing to do with #30 after all.