syself / cluster-api-provider-hetzner

Cluster API Provider Hetzner 🚀 Kubernetes Infrastructure as Software 🔧 Terraform/Kubespray/kOps alternative for running Kubernetes on Hetzner
https://caph.syself.com
Apache License 2.0
539 stars 51 forks source link

:seedling: CSR uses INFO level if providerID was not set yet. #1357

Closed guettli closed 1 week ago

guettli commented 1 week ago

What this PR does / why we need it:

In this PR https://github.com/syself/cluster-api-provider-hetzner/pull/1351 we set the ProviderID earlier, but this did not work.

The linked PR was done to avoid error messages in the CSR-controller.

This PR changes the error messages of the CSR-controller. If the CSR exists, but there is no hbmm with the corresponding ProviderID, then this is fine. We wait some seconds and then we try again.

The hbmm is likely in state "ensure-provisioned", and the providerID will be set soon. INFO level logging is fine in this case.

guettli commented 1 week ago

Testing: Without constant-hostname it works. Related logs:

level: DEBUG
time: "2024-06-21T09:20:21.784Z"
file: host/state_machine.go:73
message: changing provisioning state
controller: hetznerbaremetalhost
controllerGroup: infrastructure.cluster.x-k8s.io
controllerKind: HetznerBareMetalHost
HetznerBareMetalHost:
  name: test-bm-uefi
  namespace: default
namespace: default
name: test-bm-uefi
reconcileID: 173abccd-a45c-4cc7-a663-18ceca646d2d
HetznerBareMetalHost:
  name: test-bm-uefi
  namespace: default
HetznerCluster:
  name: caph-guettli
  namespace: default
Cluster:
  name: caph-guettli
  namespace: default
HetznerBareMetalMachine:
  name: caph-guettli-md-1-wvp4v-st54z
  namespace: default
old: provisioning
new: ensure-provisioned
---
level: DEBUG
time: "2024-06-21T09:20:21.784Z"
logger: events
file: recorder/recorder.go:104
message: 'Phase provisioning, reboot via SSH: UserData of cloud-init was created'
type: Normal
object:
  kind: HetznerBareMetalHost
  namespace: default
  name: test-bm-uefi
  uid: dd2a648f-d375-4e93-8a50-90e54e781a6f
  apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
  resourceVersion: "116625"
reason: RebootBMServerViaSSHProvisioningStateProvisioning
---
level: DEBUG
time: "2024-06-21T09:20:21.906Z"
file: host/host.go:156
message: conflict error. Retrying
controller: hetznerbaremetalhost
controllerGroup: infrastructure.cluster.x-k8s.io
controllerKind: HetznerBareMetalHost
HetznerBareMetalHost:
  name: test-bm-uefi
  namespace: default
namespace: default
name: test-bm-uefi
reconcileID: a0f653a1-24fb-4462-aba8-210e9749599a
HetznerBareMetalHost:
  name: test-bm-uefi
  namespace: default
HetznerCluster:
  name: caph-guettli
  namespace: default
Cluster:
  name: caph-guettli
  namespace: default
HetznerBareMetalMachine:
  name: caph-guettli-md-1-wvp4v-st54z
  namespace: default
err: 'Operation cannot be fulfilled on hetznerbaremetalhosts.infrastructure.cluster.x-k8s.io "test-bm-uefi": the object has been modified; please apply your changes to the latest version and try again'
---
level: DEBUG
time: "2024-06-21T09:22:27.918Z"
file: controllers/csr_controller.go:311
message: No constant baremetal hostname - regex does not match CSR username
controller: certificatesigningrequest
controllerGroup: certificates.k8s.io
controllerKind: CertificateSigningRequest
CertificateSigningRequest:
  name: csr-8lvwd
namespace: ""
name: csr-8lvwd
reconcileID: a33a6a29-8d4a-477e-828c-3e9c6ec7450d
regex: ^bm-(\S*)-(\d+)$
csrUserName: system:node:bm-caph-guettli-md-1-wvp4v-st54z
---
level: DEBUG
time: "2024-06-21T09:22:27.934Z"
logger: events
file: recorder/recorder.go:104
message: approved csr
type: Normal
object:
  kind: CertificateSigningRequest
  name: csr-8lvwd
  uid: e7d0b975-ea14-49d0-bdd0-7897f64a9fae
  apiVersion: certificates.k8s.io/v1
  resourceVersion: "202823"
reason: CSRApproved
---
level: DEBUG
time: "2024-06-21T09:22:34.244Z"
logger: events
file: recorder/recorder.go:104
message: server successfully provisioned
type: Normal
object:
  kind: HetznerBareMetalHost
  namespace: default
  name: test-bm-uefi
  uid: dd2a648f-d375-4e93-8a50-90e54e781a6f
  apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
  resourceVersion: "116919"
reason: ServerProvisioned
---
level: DEBUG
time: "2024-06-21T09:22:34.819Z"
file: host/state_machine.go:73
message: changing provisioning state
controller: hetznerbaremetalhost
controllerGroup: infrastructure.cluster.x-k8s.io
controllerKind: HetznerBareMetalHost
HetznerBareMetalHost:
  name: test-bm-uefi
  namespace: default
namespace: default
name: test-bm-uefi
reconcileID: 96a88d5a-f57b-4a11-a4c6-0b603e46e98f
HetznerBareMetalHost:
  name: test-bm-uefi
  namespace: default
HetznerCluster:
  name: caph-guettli
  namespace: default
Cluster:
  name: caph-guettli
  namespace: default
HetznerBareMetalMachine:
  name: caph-guettli-md-1-wvp4v-st54z
  namespace: default
old: ensure-provisioned
new: provisioned
---
level: DEBUG
time: "2024-06-21T09:22:34.819Z"
logger: events
file: recorder/recorder.go:104
message: "/var/log/cloud-init-output.log: Cloud-init v. 24.1.3-0ubuntu1~22.04.5  [cut] ..."
type: Normal
object:
  kind: HetznerBareMetalHost
  namespace: default
  name: test-bm-uefi
  uid: dd2a648f-d375-4e93-8a50-90e54e781a6f
  apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
  resourceVersion: "116919"
reason: CloudInitOutput
guettli commented 1 week ago

Here the logs with a machine with constant-hostname (I filtered the output a bit so that it has less lines):

the important part "ProviderID not set yet." is highlighted. It appears only once. So waiting 5s seems good.


level: DEBUG
time: "2024-06-21T10:03:30.288Z"
logger: events
message: 'will retry: . StdErr:  (failed to dial ssh. Error message: dial tcp 37.27.97.19:22: connect: connection refused. DialErr: failed to dial ssh)'
object:
  kind: HetznerBareMetalHost
  name: test-bm-uefi
  apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
reason: SSHToRescueSystemFailed
---
level: DEBUG
time: "2024-06-21T10:03:30.293Z"
message: conflict error. Retrying
HetznerBareMetalHost:
  name: test-bm-uefi
name: test-bm-uefi
HetznerBareMetalHost:
  name: test-bm-uefi
HetznerCluster:
Cluster:
HetznerBareMetalMachine:
err: 'Operation cannot be fulfilled on hetznerbaremetalhosts.infrastructure.cluster.x-k8s.io "test-bm-uefi": the object has been modified; please apply your changes to the latest version and try again'
---
level: DEBUG
time: "2024-06-21T10:03:30.378Z"
message: conflict error. Retrying
HetznerBareMetalHost:
  name: test-bm-uefi
name: test-bm-uefi
HetznerBareMetalHost:
  name: test-bm-uefi
HetznerCluster:
Cluster:
HetznerBareMetalMachine:
err: 'Operation cannot be fulfilled on hetznerbaremetalhosts.infrastructure.cluster.x-k8s.io "test-bm-uefi": the object has been modified; please apply your changes to the latest version and try again'
---
level: DEBUG
time: "2024-06-21T10:04:51.952Z"
message: changing provisioning state
HetznerBareMetalHost:
  name: test-bm-uefi
name: test-bm-uefi
HetznerBareMetalHost:
  name: test-bm-uefi
HetznerCluster:
Cluster:
HetznerBareMetalMachine:
old: provisioning
new: ensure-provisioned
---
level: DEBUG
time: "2024-06-21T10:04:51.953Z"
logger: events
message: 'Phase provisioning, reboot via SSH: UserData of cloud-init was created'
object:
  kind: HetznerBareMetalHost
  name: test-bm-uefi
  apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
reason: RebootBMServerViaSSHProvisioningStateProvisioning
---
level: DEBUG
time: "2024-06-21T10:04:52.107Z"
message: conflict error. Retrying
HetznerBareMetalHost:
  name: test-bm-uefi
name: test-bm-uefi
HetznerBareMetalHost:
  name: test-bm-uefi
HetznerCluster:
Cluster:
HetznerBareMetalMachine:
err: 'Operation cannot be fulfilled on hetznerbaremetalhosts.infrastructure.cluster.x-k8s.io "test-bm-uefi": the object has been modified; please apply your changes to the latest version and try again'
---

*************** HERE ***********

level: INFO
time: "2024-06-21T10:06:56.046Z"
message: 'ProviderID not set yet. The hbmm seems to be in ''ensure-provision''. Retrying. ProviderID: "hcloud://bm-2317525" no HetznerBaremetalMachine by ProviderID found'
CertificateSigningRequest:
  name: csr-6pxgn
name: csr-6pxgn
CertificateSigningRequest:
  name: csr-6pxgn

---
level: DEBUG
time: "2024-06-21T10:06:57.856Z"
logger: events
message: server successfully provisioned
object:
  kind: HetznerBareMetalHost
  name: test-bm-uefi
  apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
reason: ServerProvisioned
---
level: DEBUG
time: "2024-06-21T10:06:58.522Z"
message: changing provisioning state
HetznerBareMetalHost:
  name: test-bm-uefi
name: test-bm-uefi
HetznerBareMetalHost:
  name: test-bm-uefi
HetznerCluster:
Cluster:
HetznerBareMetalMachine:
old: ensure-provisioned
new: provisioned
---
level: DEBUG
time: "2024-06-21T10:06:58.522Z"
logger: events
message: "/var/log/cloud-init-output.log: Cloud-init v. 24.1.3-0ubuntu1~22.04.5 .. .cut"
object:
  kind: HetznerBareMetalHost
  name: test-bm-uefi
  apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
reason: CloudInitOutput
---
level: INFO
time: "2024-06-21T10:07:01.047Z"
message: Found HetznerBareMetalMachine with constant hostname
CertificateSigningRequest:
  name: csr-6pxgn
name: csr-6pxgn
csr-username: system:node:bm-caph-guettli-2317525
hetznerBareMetalMachine: caph-guettli-md-1-vn5zw-v7gv6
---
level: INFO
time: "2024-06-21T10:07:01.047Z"
message: found hbmm for system:node:bm-caph-guettli-2317525 (ConstantHostname)
CertificateSigningRequest:
  name: csr-6pxgn
name: csr-6pxgn
---
level: DEBUG
time: "2024-06-21T10:07:01.069Z"
logger: events
message: approved csr
object:
  kind: CertificateSigningRequest
  name: csr-6pxgn
  apiVersion: certificates.k8s.io/v1
reason: CSRApproved