HetznerCloud reports Cloud capacity reached in error #21

sandrinr commented 2 years ago

Jenkins and plugins versions report

Jenkins: 2.340
OS: Linux - 5.4.0-66-generic

What Operating System are you using (both controller, and any agents involved in the problem)?

Controller: Ubuntu 20.04 Nodes: Ubuntu 20.04

Reproduction steps

  1. Set maximum cloud nodes in /configureClouds/ Instance Cap to 10
  2. Let the Hetzner Cloud Plugin run
  3. Enjoy automatic scaling of Jenkins nodes
  4. At some point the Jenkins cloud console starts reporting Invalid API response : 412 and fails to start runners
  5. Looking at the Jenkins log one finds WARNING hudson.slaves.NodeProvisioner#update: Unexpected exception encountered while provisioning agent hcloud-enj1tr4eywrg3brp and WARNING c.d.j.p.hetzner.HetznerCloud#provision: Cloud capacity reached. Has 10 but want 9 more even though only four nodes are running

Cloud Console:


Cloud config:


Expected Results

Jenkins spawns nodes until it really reaches the configured limit

Actual Results

Jenkins stops launching new nodes even though the limit would allow it.

Anything else?

rkosegi commented 2 years ago

Thank you for bug report. Can you confirm how many VM instances are actually in your hcloud project (eg. via web console or using CLI)? Also, would it be possible to configure debug logging for logger cloud.dnation.jenkins.plugins.hetzner in your instance and share relevant logs? One more query, how many executors are configured on agent hetzner-cloud-cicd?

Thing is, that I'm not really sure where error is coming from.

sandrinr commented 2 years ago

Thank your for picking this up.

Can you confirm how many VM instances are actually in your hcloud project (eg. via web console or using CLI)?

At that point exactly the 4 instances known to Jenkins were visible in hcloud.

Also, would it be possible to configure debug logging for logger cloud.dnation.jenkins.plugins.hetzner in your instance and share relevant logs?

Activated that now. When experiencing the issue I increased the maximum Instance Cap to get rid of the issue. I lowered it again to the old value. We only hit the issue twice until now. The last time was a month or so ago.

One more query, how many executors are configured on agent hetzner-cloud-cicd

6 per node. I was wondering whether that has any influence on the calculation.

rkosegi commented 2 years ago

Activated that now. When experiencing the issue I increased the maximum Instance Cap to get rid of the issue. I lowered it again to the old value. We only hit the issue twice until now. The last time was a month or so ago.

That's interesting information. Working theory is, that something went wrong in Hetzner cloud itself (like Invalid API response : 412 - which is not documented) and Jenkins just keep provisioning new agents to satisfy excess workload (which counts towards instance cap), but was not able to connect to newly created instances.

6 per node. I was wondering whether that has any influence on the calculation.

Actually it does. More executors per agent means less virtual machines. Looking at the code, and I have a feeling calculation might be wrong. Let me do some tests and possible fix.

To recap, error in report above is actually manifestation of something else and at same time, calculation of remaining capacity might be incorrect. Will keep you posted.

rkosegi commented 2 years ago

@sandrinr I fixed calculation of available capacity and new release should be available soon in update center. If you encounter any issue feel free to open new issue.

sandrinr commented 2 years ago

Thank your for the update!

Yesterday afternoon we again hit the the Invalid API response: 412 error. However, I think this time it was sth different.

2022-04-06 16:28:00.530+0000 [id=66]    INFO    h.s.NodeProvisioner$StandardStrategyImpl#apply: Started provisioning hcloud-2xa63r9g30avnohj from hetzner-cloud-cicd with 6 executors. Remaining excess workload: -5
2022-04-06 16:28:00.871+0000 [id=395687]        INFO    c.d.j.p.h.HetznerCloudResourceManager#searchResourceByLabelExpression: Trying to find single resource for label expression 'type=jenkins-node'
2022-04-06 16:28:09.571+0000 [id=52]    WARNING hudson.slaves.NodeProvisioner#update: Unexpected exception encountered while provisioning agent hcloud-2xa63r9g30avnohj
java.lang.IllegalStateException: Invalid API response : 412
        at cloud.dnation.jenkins.plugins.hetzner.Helper.assertValidResponse(
        at cloud.dnation.jenkins.plugins.hetzner.HetznerCloudResourceManager.createServer(
        at jenkins.util.ContextResettingExecutorService$
        at java.base/
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(
        at java.base/java.util.concurrent.ThreadPoolExecutor$
        at java.base/

With errors like these I would expect to find stray Hetzner cloud node lying around. However, until now we did not have a single stray node we had to clean up manually. Somehow Jenkins manages to keep track of them and clean them up even in case of such errors.

rkosegi commented 2 years ago

With errors like these I would expect to find stray Hetzner cloud node lying around. However, until now we did not have a single stray node we had to clean up manually. Somehow Jenkins manages to keep track of them and clean them up even in case of such errors.

This happened during server creation, so it hasn't got chance to create stray server in cloud. Also, there is periodic job (once every 1 hour) that will remove stray servers automatically.

Still wondering how to provoke Hetzner cloud to produce this HTTP412. Maybe only way to found out is to start jenkins with this system property -Dcloud.dnation.jenkins.plugins.hetzner.client-debug=true which would put Hetzner API requests and responses into log. Let me think about it, there should be easier way to perform debug

rkosegi commented 2 years ago

With release 39, API requests/responses can be seen in log recorder if you configure ALL level for logger cloud.dnation.jenkins.plugins.hetzner.client

If you encounter HTTP412 response, you can search for string <-- 412 which should be followed by response body (in json format). That might contain additional details about what went wrong

sandrinr commented 2 years ago

Today in the morning, we got the 412 response again. I was able to capture the error:

--> POST
Content-Type: application/json; charset=UTF-8
Content-Length: 276
Authorization: ██
User-Agent: Jenkins Hetzner Plugin


--> END POST (276-byte body)
<-- 412 (219ms)
date: Wed, 13 Apr 2022 06:30:58 GMT
content-type: application/json
content-length: 118
ratelimit-limit: 3600
ratelimit-remaining: 3598
ratelimit-reset: 1649831460
x-correlation-id: ██
strict-transport-security: max-age=15724800; includeSubDomains
access-control-allow-origin: *
access-control-allow-credentials: true

  "error": {
    "message": "error during placement",
    "code": "resource_unavailable",
    "details": null

<-- END HTTP (118-byte body)

Meanwhile the error in the main Jenkins log was:

2022-04-13 06:30:46.533+0000 [id=66]    INFO    c.d.j.p.hetzner.HetznerCloud#provision: Creating new agent with 6 executors, have 0 running VMs
2022-04-13 06:30:46.534+0000 [id=66]    INFO    h.s.NodeProvisioner$StandardStrategyImpl#apply: Started provisioning  from hetzner-cloud-cicd with 6 executors. Remaining excess workload: -5
2022-04-13 06:30:46.917+0000 [id=177990]        INFO    c.d.j.p.h.HetznerCloudResourceManager#searchResourceByLabelExpression: Trying to find single resource for label expression 'type=jenkins-node'
2022-04-13 06:30:56.307+0000 [id=29]    WARNING hudson.slaves.NodeProvisioner#update: Unexpected exception encountered while provisioning agent ██
java.lang.IllegalStateException: Invalid API response : 412
        at cloud.dnation.jenkins.plugins.hetzner.Helper.assertValidResponse(
        at cloud.dnation.jenkins.plugins.hetzner.HetznerCloudResourceManager.createServer(
        at jenkins.util.ContextResettingExecutorService$
        at java.base/
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(
        at java.base/java.util.concurrent.ThreadPoolExecutor$
        at java.base/

I think this is different from my initial report. Here, it really seems to be a Hetzner issue. However, when I do hcloud server create, which I guess does the same API call, with the same token then it works. However, this could be coincidence.

rkosegi commented 2 years ago

{ "error": { "message": "error during placement", "code": "resource_unavailable", "details": null } }

That confirms issue at Hetzner side. I suspect that there are no servers available at given datacenter. Similar issue reported here

sandrinr commented 2 years ago

Should anybody else stumble upon this, we solved the underlying issue by configuring multiple server templates in Jenkins' Cloud Configuration for Hetzner. One template for each location we want to support. The templates all spawn nodes with the same executor labels. I don't know how exactly the templates get selected this case but we observed downtimes due to resource placement issues anymore since that change Jenkins was always able to create resources using another template should one fail.