Closed redimp closed 3 weeks ago
Without setting HCLOUD_NETWORK
the hcloud-cloud-controller-manager is unable to receive the node adress:
I0404 08:26:48.044310 1 node_controller.go:431] Initializing node k3s-controlplane1 with cloud provider
E0404 08:26:48.247486 1 node_controller.go:240] error syncing 'k3s-controlplane1': failed to get node modifiers from cloud provider: provided node ip for node "k3s-controlplane1" is not valid: failed to get node address from cloud provider that matches ip: 10.0.0.2, requeuing
I0404 08:26:48.247561 1 node_controller.go:431] Initializing node k3s-controlplane2 with cloud provider
E0404 08:26:48.688221 1 node_controller.go:240] error syncing 'k3s-controlplane2': failed to get node modifiers from cloud provider: provided node ip for node "k3s-controlplane2" is not valid: failed to get node address from cloud provider that matches ip: 10.0.0.3, requeuing
I0404 08:26:48.688270 1 node_controller.go:431] Initializing node k3s-controlplane3 with cloud provider
E0404 08:26:48.954460 1 node_controller.go:240] error syncing 'k3s-controlplane3': failed to get node modifiers from cloud provider: provided node ip for node "k3s-controlplane3" is not valid: failed to get node address from cloud provider that matches ip: 10.0.0.4, requeuing
And for the sake of completeness, with hccm-values.yaml
---
networking:
enabled: true
clusterCIDR: 10.42.0.0/16
network:
valueFrom:
secretKeyRef:
name: hcloud
key: network
the hcloud-cloud-controller-manager starts and adds the metadata as expected.
This is not a solution for us, since
a) we don't want the hccm to manage the routes and
b) we want to use robots: true
.
Just to clarify, you mentioned "HelmChart version 3.3.0" in the original issue. We do not have a helm chart with that version, the current version is 1.19.0
.
Sorry, that was a copy n paste error. I'm using 1.19.0
as in the helm command line.
I am unable to reproduce this with hccm 1.19.0
and the values file you provided.
While trying to reproduce I noticed that you also need to provide the k3s flag --disable-cloud-controller
, as otherwise k3s will start its own cloud-controller-manager that conflicts with hccm. You will then see these error messages:
Error getting instance metadata for node addresses: hcloud/instancesv2.InstanceMetadata: failed to convert provider id to server id: providerID does not have one of the the expected prefixes (hcloud://, hrobot://, hcloud://bm-): k3s://hetzner-k3s
I installed k3s with:
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="--kubelet-arg=cloud-provider=external --disable-cloud-controller" INSTALL_K3S_VERSION="v1.29.2+k3s1" sh -
Then created a secret for hccm:
kubectl create secret generic -n kube-system hcloud --from-literal=token=$HCLOUD_TOKEN --from-literal=network=hetzner-k3s
And installed the chart the same way you did with the first hccm-values.yaml
in the original description.
Could you post the output of the two following commands here?
kubectl get deployment -n kube-system hcloud-cloud-controller-manager -o yaml
kubectl get node k3s-controlplane1 -o yaml
My bad. I must been lost in values.
The described behaviour
E0404 11:02:01.187593 1 controllermanager.go:321] Error starting "node-route-controller"
F0404 11:02:01.187624 1 controllermanager.go:223] error running controllers: invalid CIDR[0]: <nil>
(invalid CIDR address: )
happens with the values.yaml
env:
HCLOUD_TOKEN:
valueFrom:
secretKeyRef:
name: hcloud
key: token
HCLOUD_NETWORK:
valueFrom:
secretKeyRef:
name: hcloud
key: network
networking:
enabled: false
robot:
enabled: false
Note: k3s is running with --disable-cloud-controller
.
kubectl get deployment -n kube-system hcloud-cloud-controller-manager -o yaml
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
meta.helm.sh/release-name: hccm
meta.helm.sh/release-namespace: kube-system
creationTimestamp: "2024-04-04T11:19:10Z"
generation: 1
labels:
app.kubernetes.io/managed-by: Helm
name: hcloud-cloud-controller-manager
namespace: kube-system
resourceVersion: "3440"
uid: 62e7b715-e99d-4878-8133-d01cd17a95be
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 2
selector:
matchLabels:
app.kubernetes.io/instance: hccm
app.kubernetes.io/name: hcloud-cloud-controller-manager
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app.kubernetes.io/instance: hccm
app.kubernetes.io/name: hcloud-cloud-controller-manager
spec:
containers:
- command:
- /bin/hcloud-cloud-controller-manager
- --allow-untagged-cloud
- --cloud-provider=hcloud
- --route-reconciliation-period=30s
- --webhook-secure-port=0
- --leader-elect=false
env:
- name: HCLOUD_NETWORK
valueFrom:
secretKeyRef:
key: network
name: hcloud
- name: HCLOUD_TOKEN
valueFrom:
secretKeyRef:
key: token
name: hcloud
- name: ROBOT_PASSWORD
valueFrom:
secretKeyRef:
key: robot-password
name: hcloud
optional: true
- name: ROBOT_USER
valueFrom:
secretKeyRef:
key: robot-user
name: hcloud
optional: true
image: hetznercloud/hcloud-cloud-controller-manager:v1.19.0
imagePullPolicy: IfNotPresent
name: hcloud-cloud-controller-manager
ports:
- containerPort: 8233
name: metrics
protocol: TCP
resources:
requests:
cpu: 100m
memory: 50Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: Default
priorityClassName: system-cluster-critical
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: hcloud-cloud-controller-manager
serviceAccountName: hcloud-cloud-controller-manager
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoSchedule
key: node.cloudprovider.kubernetes.io/uninitialized
value: "true"
- key: CriticalAddonsOnly
operator: Exists
- effect: NoSchedule
key: node-role.kubernetes.io/master
operator: Exists
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
operator: Exists
- effect: NoExecute
key: node.kubernetes.io/not-ready
status:
conditions:
- lastTransitionTime: "2024-04-04T11:19:10Z"
lastUpdateTime: "2024-04-04T11:19:11Z"
message: ReplicaSet "hcloud-cloud-controller-manager-6f454fcfbf" has successfully
progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
- lastTransitionTime: "2024-04-04T11:19:19Z"
lastUpdateTime: "2024-04-04T11:19:19Z"
message: Deployment does not have minimum availability.
reason: MinimumReplicasUnavailable
status: "False"
type: Available
observedGeneration: 1
replicas: 1
unavailableReplicas: 1
updatedReplicas: 1
kubectl get node k3s-controlplane1 -o yaml
apiVersion: v1
kind: Node
metadata:
annotations:
alpha.kubernetes.io/provided-node-ip: 10.0.0.2
etcd.k3s.cattle.io/local-snapshots-timestamp: "2024-04-04T11:08:33Z"
etcd.k3s.cattle.io/node-address: 10.0.0.2
etcd.k3s.cattle.io/node-name: k3s-controlplane1-ba0bd5a4
k3s.io/node-args: '["server","--data-dir","/var/lib/rancher/k3s","--disable","traefik","--disable","servicelb","--flannel-backend","none","--disable-network-policy","--embedded-registry","true","--write-kubeconfig-mode","0600","--tls-san","lbctrl.iquestria.cso.ninja","--disable-cloud-controller","--token","********","--tls-san","k3s-controlplane1","--tls-san","10.0.0.2","--node-ip","10.0.0.2","--node-external-ip","x.x.x.x","--kubelet-arg","cloud-provider=external"]'
k3s.io/node-config-hash: QNU4YAKJZSOORINBMHYXXYIO754HSV5OGAWEWZC56NJR74RX56AQ====
k3s.io/node-env: '{"K3S_DATA_DIR":"/var/lib/rancher/k3s/data/4344eae0657f7fc0c99af34fc51358389f500f18c9bb80f5a55c130de07565d2"}'
node.alpha.kubernetes.io/ttl: "0"
p2p.k3s.cattle.io/node-address: /ip4/10.0.0.2/tcp/5001/p2p/QmWjS45ca9RZuoMnavYUhNHH4wD7V4SXVHRhzcn1tCWNdi
volumes.kubernetes.io/controller-managed-attach-detach: "true"
creationTimestamp: "2024-04-04T11:07:10Z"
finalizers:
- wrangler.cattle.io/node
- wrangler.cattle.io/managed-etcd-controller
labels:
beta.kubernetes.io/arch: arm64
beta.kubernetes.io/os: linux
kubernetes.io/arch: arm64
kubernetes.io/hostname: k3s-controlplane1
kubernetes.io/os: linux
node-role.kubernetes.io/control-plane: "true"
node-role.kubernetes.io/etcd: "true"
node-role.kubernetes.io/master: "true"
p2p.k3s.cattle.io/enabled: "true"
name: k3s-controlplane1
resourceVersion: "4135"
uid: c1b6d78b-55dc-47f8-9ba0-557b81a452a7
spec:
podCIDR: 10.42.0.0/24
podCIDRs:
- 10.42.0.0/24
taints:
- effect: NoSchedule
key: node.cloudprovider.kubernetes.io/uninitialized
value: "true"
status:
addresses:
- address: 10.0.0.2
type: InternalIP
- address: k3s-controlplane1
type: Hostname
allocatable:
cpu: "4"
ephemeral-storage: "55192664021"
hugepages-1Gi: "0"
hugepages-2Mi: "0"
hugepages-32Mi: "0"
hugepages-64Ki: "0"
memory: 7934528Ki
pods: "110"
capacity:
cpu: "4"
ephemeral-storage: 56735880Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
hugepages-32Mi: "0"
hugepages-64Ki: "0"
memory: 7934528Ki
pods: "110"
conditions:
- lastHeartbeatTime: "2024-04-04T11:10:25Z"
lastTransitionTime: "2024-04-04T11:10:25Z"
message: Cilium is running on this node
reason: CiliumIsUp
status: "False"
type: NetworkUnavailable
- lastHeartbeatTime: "2024-04-04T11:22:30Z"
lastTransitionTime: "2024-04-04T11:07:22Z"
message: Node is a voting member of the etcd cluster
reason: MemberNotLearner
status: "True"
type: EtcdIsVoter
- lastHeartbeatTime: "2024-04-04T11:20:46Z"
lastTransitionTime: "2024-04-04T11:07:10Z"
message: kubelet has sufficient memory available
reason: KubeletHasSufficientMemory
status: "False"
type: MemoryPressure
- lastHeartbeatTime: "2024-04-04T11:20:46Z"
lastTransitionTime: "2024-04-04T11:07:10Z"
message: kubelet has no disk pressure
reason: KubeletHasNoDiskPressure
status: "False"
type: DiskPressure
- lastHeartbeatTime: "2024-04-04T11:20:46Z"
lastTransitionTime: "2024-04-04T11:07:10Z"
message: kubelet has sufficient PID available
reason: KubeletHasSufficientPID
status: "False"
type: PIDPressure
- lastHeartbeatTime: "2024-04-04T11:20:46Z"
lastTransitionTime: "2024-04-04T11:10:20Z"
message: kubelet is posting ready status. AppArmor enabled
reason: KubeletReady
status: "True"
type: Ready
daemonEndpoints:
kubeletEndpoint:
Port: 10250
images:
- names:
- quay.io/cilium/cilium@sha256:bfeb3f1034282444ae8c498dca94044df2b9c9c8e7ac678e0b43c849f0b31746
sizeBytes: 195832613
- names:
- quay.io/cilium/operator-generic@sha256:4dd8f67630f45fcaf58145eb81780b677ef62d57632d7e4442905ad3226a9088
sizeBytes: 24175419
- names:
- docker.io/rancher/mirrored-pause@sha256:74c4244427b7312c5b901fe0f67cbc53683d06f4f24c6faee65d4182bf0fa893
- docker.io/rancher/mirrored-pause:3.6
sizeBytes: 253243
nodeInfo:
architecture: arm64
bootID: b44ffa8e-82e2-4740-b6ab-bf53631f8310
containerRuntimeVersion: containerd://1.7.11-k3s2
kernelVersion: 6.1.0-18-arm64
kubeProxyVersion: v1.29.2+k3s1
kubeletVersion: v1.29.2+k3s1
machineID: e7c1065f9ccd42ce8d0c10c61a494f91
operatingSystem: linux
osImage: Debian GNU/Linux 12 (bookworm)
systemUUID: 2376c8c9-a1c5-4485-8bea-efcfa76fb865
with
networking:
enabled: false
network:
valueFrom:
secretKeyRef:
name: hcloud
key: network
robot:
enabled: false
There is no env: HCLOUD_NETWORK
set:
kubectl get deployment -n kube-system hcloud-cloud-controller-manager -o yaml
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
meta.helm.sh/release-name: hccm
meta.helm.sh/release-namespace: kube-system
creationTimestamp: "2024-04-04T11:10:32Z"
generation: 1
labels:
app.kubernetes.io/managed-by: Helm
name: hcloud-cloud-controller-manager
namespace: kube-system
resourceVersion: "2171"
uid: e97fe5ed-db35-4eaf-a290-371b87780a2c
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 2
selector:
matchLabels:
app.kubernetes.io/instance: hccm
app.kubernetes.io/name: hcloud-cloud-controller-manager
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app.kubernetes.io/instance: hccm
app.kubernetes.io/name: hcloud-cloud-controller-manager
spec:
containers:
- command:
- /bin/hcloud-cloud-controller-manager
- --allow-untagged-cloud
- --cloud-provider=hcloud
- --route-reconciliation-period=30s
- --webhook-secure-port=0
- --leader-elect=false
env:
- name: HCLOUD_TOKEN
valueFrom:
secretKeyRef:
key: token
name: hcloud
- name: ROBOT_PASSWORD
valueFrom:
secretKeyRef:
key: robot-password
name: hcloud
optional: true
- name: ROBOT_USER
valueFrom:
secretKeyRef:
key: robot-user
name: hcloud
optional: true
image: hetznercloud/hcloud-cloud-controller-manager:v1.19.0
imagePullPolicy: IfNotPresent
name: hcloud-cloud-controller-manager
ports:
- containerPort: 8233
name: metrics
protocol: TCP
resources:
requests:
cpu: 100m
memory: 50Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: Default
priorityClassName: system-cluster-critical
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: hcloud-cloud-controller-manager
serviceAccountName: hcloud-cloud-controller-manager
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoSchedule
key: node.cloudprovider.kubernetes.io/uninitialized
value: "true"
- key: CriticalAddonsOnly
operator: Exists
- effect: NoSchedule
key: node-role.kubernetes.io/master
operator: Exists
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
operator: Exists
- effect: NoExecute
key: node.kubernetes.io/not-ready
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2024-04-04T11:10:33Z"
lastUpdateTime: "2024-04-04T11:10:37Z"
message: ReplicaSet "hcloud-cloud-controller-manager-584f6fc4f4" has successfully
progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
- lastTransitionTime: "2024-04-04T11:13:22Z"
lastUpdateTime: "2024-04-04T11:13:22Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
observedGeneration: 1
readyReplicas: 1
replicas: 1
updatedReplicas: 1
I appreciate the help.
For the sake of completeness, without HCLOUD_NETWORK
being hccm is not able to fetch the metadata.
[...]
I0404 11:13:24.431083 1 controllermanager.go:337] Started "cloud-node-lifecycle-controller"
I0404 11:13:24.431122 1 node_lifecycle_controller.go:113] Sending events to api server
I0404 11:13:24.512098 1 shared_informer.go:318] Caches are synced for RequestHeaderAuthRequestController
I0404 11:13:24.512144 1 shared_informer.go:318] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0404 11:13:24.512166 1 shared_informer.go:318] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0404 11:13:24.531534 1 shared_informer.go:318] Caches are synced for service
I0404 11:13:24.531581 1 node_controller.go:431] Initializing node k3s-controlplane1 with cloud provider
E0404 11:13:24.964475 1 node_controller.go:240] error syncing 'k3s-controlplane1': failed to get node modifiers from cloud provider: provided node ip for node "k3s-controlplane1" is not valid: failed to get node address from cloud provider that matches ip: 10.0.0.2, requeuing
I0404 11:13:24.964549 1 node_controller.go:431] Initializing node k3s-controlplane2 with cloud provider
E0404 11:13:25.149436 1 node_controller.go:240] error syncing 'k3s-controlplane2': failed to get node modifiers from cloud provider: provided node ip for node "k3s-controlplane2" is not valid: failed to get node address from cloud provider that matches ip: 10.0.0.3, requeuing
I0404 11:13:25.149485 1 node_controller.go:431] Initializing node k3s-controlplane3 with cloud provider
E0404 11:13:25.317226 1 node_controller.go:240] error syncing 'k3s-controlplane3': failed to get node modifiers from cloud provider: provided node ip for node "k3s-controlplane3" is not valid: failed to get node address from cloud provider that matches ip: 10.0.0.4, requeuing
Thanks for the detailed responses :)
I can reproduce the issue with these values from your comment yesterday:
env:
HCLOUD_TOKEN:
valueFrom:
secretKeyRef:
name: hcloud
key: token
HCLOUD_NETWORK:
valueFrom:
secretKeyRef:
name: hcloud
key: network
networking:
enabled: false
robot:
enabled: false
The core issue is, that hccm & the Helm Chart always assume that users with Networks also want to use the Routing functionality. This is not always true and there are cases where you want the InternalIP
on the Node but no routes. This is not natively supported in the Helm Chart right now as you have discovered.
You can set the env variable HCLOUD_NETWORK_ROUTES_ENABLED=false
to disable just the routes controller.
These values should work (or just yours with the env variable added):
env:
HCLOUD_NETWORK_ROUTES_ENABLED:
value: "false"
networking:
enabled: true
Thank you. Will test that.
With HCLOUD_NETWORK_ROUTES_ENABLED=false
can we configure ROBOT_ENABLED=true
so that the dedicated nodes are handled by the hcloud-cloud-controller-manager, too?
Yes, should work :+1: You will have to do some magic to get the private IPs for the Robot Servers in, as that is not automatically supported in HCCM right now.
This issue has been marked as stale because it has not had recent activity. The bot will close the issue if no further action occurs.
TL;DR
Despite of
network: false
the hcloud-cloud-controller-manager tries to startnode-route-controller
. Thenode-route-controller
fails due to the missing CIDR.Expected behavior
hcloud-cloud-controller-manager starting up and configuring the nodes metadata.
Observed behavior
hcloud-cloud-controller-manager pod crashes with
Minimal working example
command:
hccm-values.yaml
:Remark: The same happens when configuring
as described in the README.md.
Log output
Additional information
--kubelet-arg="cloud-provider=external"