Closed LeoShivas closed 1 year ago
Hello,
error retrieving resource lock kube-system/cloud-controller-manager-proxmox: Get "https://10.96.0.1:443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/cloud-controller-manager-proxmox?timeout=5s": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Check cilium logs.
cloud-node - initialize (labels) the nodes cloud-node-lifecycle - only delete the node resource if it was deleted in proxmox.
So use it both:
enabledControllers:
- cloud-node
- cloud-node-lifecycle
PS, my cilium config - https://github.com/sergelogvinov/terraform-talos/blob/main/_deployments/vars/cilium.yaml
Since my last message, I've re-enabled the kube-proxy.
I've also corrected the enabledControllers
and set it as follow :
- name: Install Proxmox CCM chart
kubernetes.core.helm:
name: proxmox-cloud-controller-manager
namespace: kube-system
chart_ref: oci://ghcr.io/sergelogvinov/charts/proxmox-cloud-controller-manager
values:
config:
clusters:
- url: "{{ proxmox_url }}"
insecure: false
token_id: "kubernetes@pve!ccm"
token_secret: "xxxxxxxxxxxxxxxxxx"
region: main
enabledControllers:
- cloud-node
- cloud-node-lifecycle
nodeSelector:
node-role.kubernetes.io/control-plane: ""
tolerations:
- key: node-role.kubernetes.io/control-plane
effect: NoSchedule
Here are the logs I encountered in the kube-system/proxmox-cloud-controller-manager-7b85484c94
pod :
E1023 07:35:40.972347 1 leaderelection.go:332] error retrieving resource lock kube-system/cloud-controller-manager-proxmox: Get "https://10.96.0.1:443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/cloud-controller-manager-proxmox?timeout=5s": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
I1023 07:36:38.058263 1 instances.go:159] instances.InstanceMetadata() is kubelet has --cloud-provider=external on the node kube-cp-1?
I1023 07:36:38.058542 1 instances.go:159] instances.InstanceMetadata() is kubelet has --cloud-provider=external on the node kube-cp-2?
I1023 07:36:38.058650 1 instances.go:159] instances.InstanceMetadata() is kubelet has --cloud-provider=external on the node kube-wk-1?
I1023 07:36:38.058697 1 instances.go:159] instances.InstanceMetadata() is kubelet has --cloud-provider=external on the node kube-wk-2?
I1023 07:36:38.058780 1 instances.go:159] instances.InstanceMetadata() is kubelet has --cloud-provider=external on the node kube-wk-3?
I1023 07:36:38.097378 1 node_controller.go:267] Update 5 nodes status took 299.025382ms.
I will have a look to you Cilium configuration.
You need to have --cloud-provider
param in kubelet daemon
Without it, nodes will initialize by themself.
You need to have
--cloud-provider
param in kubelet daemon Without it, nodes will initialize by themself.
But as the official kubelet documentation states :
--cloud-provider string
The provider for cloud services. Set to empty string for running with no cloud provider. If set, the cloud provider
determines the name of the node (consult cloud provider documentation to determine if and how the hostname
is used). (DEPRECATED: will be removed in 1.24 or later, in favor of removing cloud provider code from kubelet.)
Wait ! As I'm writing, I see that they have "Undeprecated kubelet cloud-provider flag" 4 days ago !
most of the cases DEPRECATED
means you need to use kubelet config.yaml :)
Yes, you're surely right.
But I don't find the --cloud-provider
equivalent option for the KubeletConfiguration
objet : https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/#kubelet-config-k8s-io-v1beta1-KubeletConfiguration
Here's mine :
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
serverTLSBootstrap: true
providerID: "proxmox://mycluster/mypvenode"
Proxmox CCM can set providerID for you... (if name of the node == name VM in proxmox)
I've manually added the --cloud-provider
on the command line start to the kubelet service :
[root@kube-cp-1 ~]# systemctl cat kubelet.service
# /usr/lib/systemd/system/kubelet.service
[Unit]
Description=kubelet: The Kubernetes Node Agent
Documentation=https://kubernetes.io/docs/
Wants=network-online.target
After=network-online.target
[Service]
ExecStart=/usr/bin/kubelet
Restart=always
StartLimitInterval=0
RestartSec=10
[Install]
WantedBy=multi-user.target
# /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf
# Note: This dropin only works with kubeadm and kubelet v1.11+
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
# This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
# the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
EnvironmentFile=-/etc/sysconfig/kubelet
ExecStart=
ExecStart=/usr/bin/kubelet --cloud-provider=external $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS
I've run the following commands :
systemctl daemon-reload
systemctl restart kubelet.service
I've deleted the kube-system/proxmox-cloud-controller-manager-
pod and waited a few time.
Here the logs of my kubelet service :
[root@kube-cp-1 ~]# journalctl -u kubelet -f
Oct 23 11:02:05 kube-cp-1 kubelet[47031]: "Metadata": null
Oct 23 11:02:05 kube-cp-1 kubelet[47031]: }. Err: connection error: desc = "transport: Error while dialing dial unix /var/lib/kubelet/plugins_registry/csi.proxmox.sinextra.dev-reg.sock: connect: connection refused"
Oct 23 11:02:08 kube-cp-1 kubelet[47031]: W1023 11:02:08.103331 47031 logging.go:59] [core] [Channel #60 SubChannel #61] grpc: addrConn.createTransport failed to connect to {
Oct 23 11:02:08 kube-cp-1 kubelet[47031]: "Addr": "/var/lib/kubelet/plugins_registry/csi.proxmox.sinextra.dev-reg.sock",
Oct 23 11:02:08 kube-cp-1 kubelet[47031]: "ServerName": "/var/lib/kubelet/plugins_registry/csi.proxmox.sinextra.dev-reg.sock",
Oct 23 11:02:08 kube-cp-1 kubelet[47031]: "Attributes": null,
Oct 23 11:02:08 kube-cp-1 kubelet[47031]: "BalancerAttributes": null,
Oct 23 11:02:08 kube-cp-1 kubelet[47031]: "Type": 0,
Oct 23 11:02:08 kube-cp-1 kubelet[47031]: "Metadata": null
Oct 23 11:02:08 kube-cp-1 kubelet[47031]: }. Err: connection error: desc = "transport: Error while dialing dial unix /var/lib/kubelet/plugins_registry/csi.proxmox.sinextra.dev-reg.sock: connect: connection refused"
Oct 23 11:02:11 kube-cp-1 kubelet[47031]: W1023 11:02:11.775421 47031 logging.go:59] [core] [Channel #60 SubChannel #61] grpc: addrConn.createTransport failed to connect to {
Oct 23 11:02:11 kube-cp-1 kubelet[47031]: "Addr": "/var/lib/kubelet/plugins_registry/csi.proxmox.sinextra.dev-reg.sock",
Oct 23 11:02:11 kube-cp-1 kubelet[47031]: "ServerName": "/var/lib/kubelet/plugins_registry/csi.proxmox.sinextra.dev-reg.sock",
Oct 23 11:02:11 kube-cp-1 kubelet[47031]: "Attributes": null,
Oct 23 11:02:11 kube-cp-1 kubelet[47031]: "BalancerAttributes": null,
Oct 23 11:02:11 kube-cp-1 kubelet[47031]: "Type": 0,
Oct 23 11:02:11 kube-cp-1 kubelet[47031]: "Metadata": null
Oct 23 11:02:11 kube-cp-1 kubelet[47031]: }. Err: connection error: desc = "transport: Error while dialing dial unix /var/lib/kubelet/plugins_registry/csi.proxmox.sinextra.dev-reg.sock: connect: connection refused"
Oct 23 11:02:12 kube-cp-1 kubelet[47031]: E1023 11:02:12.487082 47031 goroutinemap.go:150] Operation for "/var/lib/kubelet/plugins_registry/csi.proxmox.sinextra.dev-reg.sock" failed. No retries permitted until 2023-10-23 11:02:20.487055393 +0200 CEST m=+213.991348269 (durationBeforeRetry 8s). Error: RegisterPlugin error -- dial failed at socket /var/lib/kubelet/plugins_registry/csi.proxmox.sinextra.dev-reg.sock, err: failed to dial socket /var/lib/kubelet/plugins_registry/csi.proxmox.sinextra.dev-reg.sock, err: context deadline exceeded
Oct 23 11:02:13 kube-cp-1 kubelet[47031]: I1023 11:02:13.868747 47031 scope.go:115] "RemoveContainer" containerID="6c85953efd6b2a685899bd9c7c7fcaf758ee61c6a2efb3ad109483f33fa95b24"
Oct 23 11:02:13 kube-cp-1 kubelet[47031]: E1023 11:02:13.869277 47031 pod_workers.go:1294] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"csi-node-driver-registrar\" with CrashLoopBackOff: \"back-off 1m20s restarting failed container=csi-node-driver-registrar pod=proxmox-csi-plugin-node-bmn4q_csi-proxmox(3a4674a9-5e87-4d30-bb0f-de83ecafa20b)\"" pod="csi-proxmox/proxmox-csi-plugin-node-bmn4q" podUID=3a4674a9-5e87-4d30-bb0f-de83ecafa20b
I haven't deployed the CSI plugin yet.
Proxmox CCM can set providerID for you... (if name of the node == name VM in proxmox)
Yeah, I know it's supposed to do it :-) but as it seems my CCM doesn't work ... :-(
I've reinstalled all from scratch (as usual :-) ). If you want (and you have time do so), you can have a look to my ansible playbook deployment : https://github.com/LeoShivas/GitOps/blob/main/ansible/playbooks/kubernetes/playbook-kube-install.yml
I've updated the kubelet service adding the --cloud-provider
option on one node (kube-cp-1
) and rebooted it.
Here are the kube-system/proxmox-cloud-controller-manager-xxxxx pod logs :
I1023 11:17:10.406986 1 instances.go:159] instances.InstanceMetadata() is kubelet has --cloud-provider=external on the node kube-cp-1?
I1023 11:17:10.407037 1 instances.go:159] instances.InstanceMetadata() is kubelet has --cloud-provider=external on the node kube-cp-2?
I1023 11:17:10.407057 1 instances.go:159] instances.InstanceMetadata() is kubelet has --cloud-provider=external on the node kube-wk-1?
I1023 11:17:10.407280 1 instances.go:159] instances.InstanceMetadata() is kubelet has --cloud-provider=external on the node kube-wk-2?
I1023 11:17:10.407315 1 instances.go:159] instances.InstanceMetadata() is kubelet has --cloud-provider=external on the node kube-wk-3?
I1023 11:17:10.407464 1 node_controller.go:267] Update 5 nodes status took 569.596µs.
Here are the kubelet systemd service logs (after the reboot, when I delete the ccm pod) :
Oct 23 13:26:43 kube-cp-1 kubelet[908]: I1023 13:26:43.241837 908 scope.go:115] "RemoveContainer" containerID="00e3711823c388eee774b6a9d2c6e8a9ebe00ee8b8f24243fabecfa7f74222e3"
Oct 23 13:26:43 kube-cp-1 kubelet[908]: I1023 13:26:43.317350 908 reconciler_common.go:172] "operationExecutor.UnmountVolume started for volume \"cloud-config\" (UniqueName: \"kubernetes.io/secret/d0390647-30eb-41ca-93cc-5726172d86c8-cloud-config\") pod \"d0390647-30eb-41ca-93cc-5726172d86c8\" (UID: \"d0390647-30eb-41ca-93cc-5726172d86c8\") "
Oct 23 13:26:43 kube-cp-1 kubelet[908]: I1023 13:26:43.317434 908 reconciler_common.go:172] "operationExecutor.UnmountVolume started for volume \"kube-api-access-2ghs9\" (UniqueName: \"kubernetes.io/projected/d0390647-30eb-41ca-93cc-5726172d86c8-kube-api-access-2ghs9\") pod \"d0390647-30eb-41ca-93cc-5726172d86c8\" (UID: \"d0390647-30eb-41ca-93cc-5726172d86c8\") "
Oct 23 13:26:43 kube-cp-1 kubelet[908]: I1023 13:26:43.339730 908 operation_generator.go:878] UnmountVolume.TearDown succeeded for volume "kubernetes.io/secret/d0390647-30eb-41ca-93cc-5726172d86c8-cloud-config" (OuterVolumeSpecName: "cloud-config") pod "d0390647-30eb-41ca-93cc-5726172d86c8" (UID: "d0390647-30eb-41ca-93cc-5726172d86c8"). InnerVolumeSpecName "cloud-config". PluginName "kubernetes.io/secret", VolumeGidValue ""
Oct 23 13:26:43 kube-cp-1 kubelet[908]: I1023 13:26:43.343287 908 operation_generator.go:878] UnmountVolume.TearDown succeeded for volume "kubernetes.io/projected/d0390647-30eb-41ca-93cc-5726172d86c8-kube-api-access-2ghs9" (OuterVolumeSpecName: "kube-api-access-2ghs9") pod "d0390647-30eb-41ca-93cc-5726172d86c8" (UID: "d0390647-30eb-41ca-93cc-5726172d86c8"). InnerVolumeSpecName "kube-api-access-2ghs9". PluginName "kubernetes.io/projected", VolumeGidValue ""
Oct 23 13:26:43 kube-cp-1 kubelet[908]: I1023 13:26:43.418646 908 reconciler_common.go:300] "Volume detached for volume \"kube-api-access-2ghs9\" (UniqueName: \"kubernetes.io/projected/d0390647-30eb-41ca-93cc-5726172d86c8-kube-api-access-2ghs9\") on node \"kube-cp-1\" DevicePath \"\""
Oct 23 13:26:43 kube-cp-1 kubelet[908]: I1023 13:26:43.418696 908 reconciler_common.go:300] "Volume detached for volume \"cloud-config\" (UniqueName: \"kubernetes.io/secret/d0390647-30eb-41ca-93cc-5726172d86c8-cloud-config\") on node \"kube-cp-1\" DevicePath \"\""
Oct 23 13:26:43 kube-cp-1 kubelet[908]: I1023 13:26:43.670881 908 scope.go:115] "RemoveContainer" containerID="8a483e45dc67a5df029d1a8473cabadaa418ab8427d1694c87bacb61da53503e"
Oct 23 13:26:43 kube-cp-1 kubelet[908]: I1023 13:26:43.704316 908 topology_manager.go:212] "Topology Admit Handler"
Oct 23 13:26:43 kube-cp-1 kubelet[908]: E1023 13:26:43.704617 908 cpu_manager.go:395] "RemoveStaleState: removing container" podUID="d0390647-30eb-41ca-93cc-5726172d86c8" containerName="proxmox-cloud-controller-manager"
Oct 23 13:26:43 kube-cp-1 kubelet[908]: E1023 13:26:43.704775 908 cpu_manager.go:395] "RemoveStaleState: removing container" podUID="d0390647-30eb-41ca-93cc-5726172d86c8" containerName="proxmox-cloud-controller-manager"
Oct 23 13:26:43 kube-cp-1 kubelet[908]: I1023 13:26:43.704917 908 memory_manager.go:346] "RemoveStaleState removing state" podUID="d0390647-30eb-41ca-93cc-5726172d86c8" containerName="proxmox-cloud-controller-manager"
Oct 23 13:26:43 kube-cp-1 kubelet[908]: I1023 13:26:43.705007 908 memory_manager.go:346] "RemoveStaleState removing state" podUID="d0390647-30eb-41ca-93cc-5726172d86c8" containerName="proxmox-cloud-controller-manager"
Oct 23 13:26:43 kube-cp-1 kubelet[908]: I1023 13:26:43.767041 908 scope.go:115] "RemoveContainer" containerID="00e3711823c388eee774b6a9d2c6e8a9ebe00ee8b8f24243fabecfa7f74222e3"
Oct 23 13:26:43 kube-cp-1 kubelet[908]: E1023 13:26:43.767874 908 remote_runtime.go:415] "ContainerStatus from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find container \"00e3711823c388eee774b6a9d2c6e8a9ebe00ee8b8f24243fabecfa7f74222e3\": not found" containerID="00e3711823c388eee774b6a9d2c6e8a9ebe00ee8b8f24243fabecfa7f74222e3"
Oct 23 13:26:43 kube-cp-1 kubelet[908]: I1023 13:26:43.767980 908 pod_container_deletor.go:53] "DeleteContainer returned error" containerID={Type:containerd ID:00e3711823c388eee774b6a9d2c6e8a9ebe00ee8b8f24243fabecfa7f74222e3} err="failed to get container status \"00e3711823c388eee774b6a9d2c6e8a9ebe00ee8b8f24243fabecfa7f74222e3\": rpc error: code = NotFound desc = an error occurred when try to find container \"00e3711823c388eee774b6a9d2c6e8a9ebe00ee8b8f24243fabecfa7f74222e3\": not found"
Oct 23 13:26:43 kube-cp-1 kubelet[908]: I1023 13:26:43.768010 908 scope.go:115] "RemoveContainer" containerID="8a483e45dc67a5df029d1a8473cabadaa418ab8427d1694c87bacb61da53503e"
Oct 23 13:26:43 kube-cp-1 kubelet[908]: E1023 13:26:43.768488 908 remote_runtime.go:415] "ContainerStatus from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find container \"8a483e45dc67a5df029d1a8473cabadaa418ab8427d1694c87bacb61da53503e\": not found" containerID="8a483e45dc67a5df029d1a8473cabadaa418ab8427d1694c87bacb61da53503e"
Oct 23 13:26:43 kube-cp-1 kubelet[908]: I1023 13:26:43.768523 908 pod_container_deletor.go:53] "DeleteContainer returned error" containerID={Type:containerd ID:8a483e45dc67a5df029d1a8473cabadaa418ab8427d1694c87bacb61da53503e} err="failed to get container status \"8a483e45dc67a5df029d1a8473cabadaa418ab8427d1694c87bacb61da53503e\": rpc error: code = NotFound desc = an error occurred when try to find container \"8a483e45dc67a5df029d1a8473cabadaa418ab8427d1694c87bacb61da53503e\": not found"
Oct 23 13:26:43 kube-cp-1 kubelet[908]: I1023 13:26:43.819850 908 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"cloud-config\" (UniqueName: \"kubernetes.io/secret/86a021a4-5f5c-4b6a-b86e-4b28fa9f06c4-cloud-config\") pod \"proxmox-cloud-controller-manager-7b85484c94-dc6ll\" (UID: \"86a021a4-5f5c-4b6a-b86e-4b28fa9f06c4\") " pod="kube-system/proxmox-cloud-controller-manager-7b85484c94-dc6ll"
Oct 23 13:26:43 kube-cp-1 kubelet[908]: I1023 13:26:43.819910 908 reconciler_common.go:258] "operationExecutor.VerifyControllerAttachedVolume started for volume \"kube-api-access-zlkp4\" (UniqueName: \"kubernetes.io/projected/86a021a4-5f5c-4b6a-b86e-4b28fa9f06c4-kube-api-access-zlkp4\") pod \"proxmox-cloud-controller-manager-7b85484c94-dc6ll\" (UID: \"86a021a4-5f5c-4b6a-b86e-4b28fa9f06c4\") " pod="kube-system/proxmox-cloud-controller-manager-7b85484c94-dc6ll"
Oct 23 13:26:46 kube-cp-1 kubelet[908]: I1023 13:26:46.187057 908 kubelet_volumes.go:161] "Cleaned up orphaned pod volumes dir" podUID=d0390647-30eb-41ca-93cc-5726172d86c8 path="/var/lib/kubelet/pods/d0390647-30eb-41ca-93cc-5726172d86c8/volumes"
I've updated my init step by adding a .nodeRegistration.kubeletExtraArgs
line :
- name: Create init conf file (for adding serverTLSBootstrap option)
copy:
dest: /etc/kubernetes/kubeadm-init.yaml
content: |
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
controlPlaneEndpoint: "{{ kube_endpoint }}:6443"
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
serverTLSBootstrap: true
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
nodeRegistration:
kubeletExtraArgs:
cloud-provider: "external"
mode: 0644
My kubelet service starts well with the --cloud-provider=external
option :
[root@kube-cp-1 ~]# systemctl status kubelet.service
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; preset: disabled)
Drop-In: /usr/lib/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Mon 2023-10-23 13:58:18 CEST; 21min ago
Docs: https://kubernetes.io/docs/
Main PID: 8726 (kubelet)
Tasks: 14 (limit: 10842)
Memory: 102.6M
CPU: 22.736s
CGroup: /system.slice/kubelet.service
└─8726 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cloud-provider=external >
The error are still present in the CCM. I don't know what to do to go further.
i think you need to delete the node resource first, and then restart the kubelet, because the node already initialized... Also try to set --nodeip=${INTERFACE_IP} in kubelet param (in case if you have more then one IP)
i think you need to delete the node resource first, and then restart the kubelet, because the node already initialized... Also try to set --nodeip=${INTERFACE_IP} in kubelet param (in case if you have more then one IP)
My last comment is the result after a destroy/recreate VMs.
I only have a single "public/private" IP per node :
[root@kube-cp-1 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether e2:4d:3e:3a:e5:e0 brd ff:ff:ff:ff:ff:ff
altname enp0s18
altname ens18
inet 192.168.1.105/24 brd 192.168.1.255 scope global dynamic noprefixroute eth0
valid_lft 6501sec preferred_lft 6501sec
inet6 fe80::e04d:3eff:fe3a:e5e0/64 scope link noprefixroute
valid_lft forever preferred_lft forever
3: cilium_net@cilium_host: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 46:5b:12:f5:f9:57 brd ff:ff:ff:ff:ff:ff
inet6 fe80::445b:12ff:fef5:f957/64 scope link
valid_lft forever preferred_lft forever
4: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 2e:ca:5f:4c:50:20 brd ff:ff:ff:ff:ff:ff
inet 10.0.1.219/32 scope global cilium_host
valid_lft forever preferred_lft forever
inet6 fe80::2cca:5fff:fe4c:5020/64 scope link
valid_lft forever preferred_lft forever
5: cilium_vxlan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether e6:7b:07:d4:36:07 brd ff:ff:ff:ff:ff:ff
inet6 fe80::e47b:7ff:fed4:3607/64 scope link
valid_lft forever preferred_lft forever
7: lxc_health@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 8a:ca:4d:2f:cf:07 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet6 fe80::88ca:4dff:fe2f:cf07/64 scope link
valid_lft forever preferred_lft forever
can you show:
kubectl describe node kube-cp-1
Yes, sure :
Name: kube-cp-1
Roles: control-plane
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=kube-cp-1
kubernetes.io/os=linux
node-role.kubernetes.io/control-plane=
node.kubernetes.io/exclude-from-external-load-balancers=
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Mon, 23 Oct 2023 13:58:07 +0200
Taints: node-role.kubernetes.io/control-plane:NoSchedule
Unschedulable: false
Lease:
HolderIdentity: kube-cp-1
AcquireTime: <unset>
RenewTime: Mon, 23 Oct 2023 19:13:57 +0200
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Mon, 23 Oct 2023 14:10:44 +0200 Mon, 23 Oct 2023 14:10:44 +0200 CiliumIsUp Cilium is running on this node
MemoryPressure False Mon, 23 Oct 2023 19:11:15 +0200 Mon, 23 Oct 2023 13:58:07 +0200 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Mon, 23 Oct 2023 19:11:15 +0200 Mon, 23 Oct 2023 13:58:07 +0200 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Mon, 23 Oct 2023 19:11:15 +0200 Mon, 23 Oct 2023 13:58:07 +0200 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Mon, 23 Oct 2023 19:11:15 +0200 Mon, 23 Oct 2023 14:09:56 +0200 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 192.168.1.105
Hostname: kube-cp-1
Capacity:
cpu: 2
ephemeral-storage: 51285996Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 1808300Ki
pods: 110
Allocatable:
cpu: 2
ephemeral-storage: 47265173836
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 1705900Ki
pods: 110
System Info:
Machine ID: d359f13c93ac46b5b2f1bac975e4cbae
System UUID: d359f13c-93ac-46b5-b2f1-bac975e4cbae
Boot ID: 57aa57fe-79c3-408b-b5cb-7d32ed57b8f2
Kernel Version: 5.14.0-284.30.1.el9_2.x86_64
OS Image: Rocky Linux 9.2 (Blue Onyx)
Operating System: linux
Architecture: amd64
Container Runtime Version: containerd://1.6.24
Kubelet Version: v1.27.3
Kube-Proxy Version: v1.27.3
Non-terminated Pods: (6 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system cilium-2brfz 100m (5%) 0 (0%) 100Mi (6%) 0 (0%) 5h9m
kube-system etcd-kube-cp-1 100m (5%) 0 (0%) 100Mi (6%) 0 (0%) 5h15m
kube-system kube-apiserver-kube-cp-1 250m (12%) 0 (0%) 0 (0%) 0 (0%) 5h15m
kube-system kube-controller-manager-kube-cp-1 200m (10%) 0 (0%) 0 (0%) 0 (0%) 5h15m
kube-system kube-proxy-j276n 0 (0%) 0 (0%) 0 (0%) 0 (0%) 5h15m
kube-system kube-scheduler-kube-cp-1 100m (5%) 0 (0%) 0 (0%) 0 (0%) 5h15m
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 750m (37%) 0 (0%)
memory 200Mi (12%) 0 (0%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Events: <none>
Oh, It does not have alpha.kubernetes.io/provided-node-ip annotation.
Check again kubelet params ps axfwww
it has to have --cloud-provider=external
may be you need run systemctl daemon-reload
too
I've progressed.
Thanks a lot for all your work and the time you give to me.
In my requirements ansible scripts, I've added the folowing steps :
- name: Add node IP in kubelet config
lineinfile:
path: /etc/sysconfig/kubelet
regexp: '^KUBELET_EXTRA_ARGS='
line: KUBELET_EXTRA_ARGS=--node-ip={{ ansible_default_ipv4.address }} --cloud-provider=external
state: present
create: yes
mode: a+r
So, all my node have the alpha.kubernetes.io/provided-node-ip
annotation.
But they now have the node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule
taint.
Here is my helm install step :
- name: Install Proxmox CCM chart
kubernetes.core.helm:
name: proxmox-cloud-controller-manager
namespace: kube-system
chart_ref: oci://ghcr.io/sergelogvinov/charts/proxmox-cloud-controller-manager
values:
config:
clusters:
- url: "{{ proxmox_url }}"
insecure: false
token_id: "kubernetes@pve!ccm"
token_secret: "xxxxxxxxxxxxxxxxxxx"
region: main
enabledControllers:
- cloud-node
- cloud-node-lifecycle
nodeSelector:
node-role.kubernetes.io/control-plane: ""
Very strange behavior : CCM can't interact with my nodes because coredns is in a pending state.
When I edit the coredns deploy by adding this toleration :
- effect: NoSchedule
key: node.cloudprovider.kubernetes.io/uninitialized
value: "true"
the CCM removes the node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule
taint and all the pods go in a running state.
Is it normal that I had to update the coredns deploy to make it work ?
Congratulations!!! You've got it 👍
Yep, coredns should has this toleration. Or you can run CCM as daemonset with host-network... But we do not patch the network - it doesn't make sense.
I've finally succeeded to make it work !
The mainly part are :
- name: Add node IP in kubelet config
lineinfile:
path: /etc/sysconfig/kubelet
regexp: '^KUBELET_EXTRA_ARGS='
line: KUBELET_EXTRA_ARGS=--node-ip={{ ansible_default_ipv4.address }} --cloud-provider=external
state: present
create: yes
mode: a+r
- name: Patch coredns tolerations
kubernetes.core.k8s:
kind: Deployment
name: coredns
namespace: kube-system
definition:
spec:
template:
spec:
tolerations:
- key: node.cloudprovider.kubernetes.io/uninitialized
effect: NoSchedule
operator: Exists
become: no
- name: Install Proxmox CCM chart
kubernetes.core.helm:
name: proxmox-cloud-controller-manager
namespace: kube-system
chart_ref: oci://ghcr.io/sergelogvinov/charts/proxmox-cloud-controller-manager
values:
config:
clusters:
- url: "{{ proxmox_url }}"
insecure: false
token_id: "kubernetes@pve!ccm"
token_secret: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
region: main
enabledControllers:
- cloud-node
- cloud-node-lifecycle
nodeSelector:
node-role.kubernetes.io/control-plane: ""
- name: Create Proxmox CSI namespace
kubernetes.core.k8s:
state: present
definition:
api_version: v1
kind: Namespace
metadata:
name: csi-proxmox
labels:
app.kubernetes.io/managed-by: Helm
pod-security.kubernetes.io/enforce: privileged
annotations:
meta.helm.sh/release-name: proxmox-csi-plugin
meta.helm.sh/release-namespace: csi-proxmox
- name: Install Proxmox CSI chart
kubernetes.core.helm:
name: proxmox-csi-plugin
namespace: csi-proxmox
chart_ref: oci://ghcr.io/sergelogvinov/charts/proxmox-csi-plugin
values:
config:
clusters:
- url: "{{ proxmox_url }}"
insecure: false
token_id: "kubernetes-csi@pve!csi"
token_secret: "yyyyyyyyyyyyyyyyyyyyyyyyyyyy"
region: main
node:
nodeSelector:
tolerations:
- operator: Exists
nodeSelector:
node-role.kubernetes.io/control-plane: ""
tolerations:
- key: node-role.kubernetes.io/control-plane
effect: NoSchedule
storageClass:
- name: proxmox-data
storage: local
reclaimPolicy: Delete
fstype: ext4
cache: none
Next step : replace kube-proxy by cilium
Great thanks again for you work !
Maybe some clarifications can be made in the documentation. I may fork your repo and make a PR.
@sergelogvinov the info about need for --node-ip
and/or alpha.kubernetes.io/provided-node-ip
should be first thing in Install docs :/ I spend few hours trying to find out why this simply doesn't work after finding this issue and hitting my head with facepalm, as no other custom cloud controller I've used before used this option.
@morsik I’m truly sorry to hear that.
I've updated the documentation https://github.com/sergelogvinov/proxmox-cloud-controller-manager/blob/main/docs/install.md#requirements
Thank you for contributing to the project!
@sergelogvinov thank you! It works great after I discovered this simple change, but it real pain to understand why I'm getting node IP errors and then I hit source code and found this issue.
BTW, this is not true at all:
If your node has multiple IP addresses
You explicitly look for that annotation in your source code! I had single IP address and it still didn't worked for that very reason!
The kubelet sets the value of node.ObjectMeta.Annotations[cloudproviderapi.AnnotationAlphaProvidedIPAddr]
during the cluster join process. It can be one or two IPs from different stacks. There are many cases then IPs may fluctuate after a restart... So --node-ip
recommended value.
This list of IPs sets as NodeInternalIP
in kubernetes node resource.
The node resource contains many immutable values that the CCM cannot modify after initialization. If you run the kubelet without the --cloud-provider=external
flag initially and then enable it later, the CCM will not make any changes because the node has already been initialized by the kubelet.
Therefore, if you need to change certain kubelet flags, it’s recommended to delete the node resource first to ensure the changes take effect.
@sergelogvinov interesting... I've never seen such annotation at all.
I just installed fresh 1.31.1 cluster yesterday, I also installed previously 1.29 and 1.30 fresh and never saw such annotation, even though I had single network interface with single IP.
Regarding "the CCM will not make any changes because the node has already been initialized by the kubelet" - I've already explained in another discussion how to fix this and retrigger initialization ;)
Bug Report
Description
Since I've removed kube-proxy and let routing to be done by Cilium, my CCM does not labelize nodes anymore. I made a fresh install.
Logs
Here are some logs from my proxmox-cloud-controller-manager pod :
Environment
Additionnal information
I've created my K8S cluster with these Ansible steps :
I've installed Cilium with these Ansible steps :
Here are the Ansible steps I used for deploying CCM and CSI plugin :
Here is one of my worker node :