sergelogvinov / proxmox-cloud-controller-manager

Kubernetes cloud controller manager for Proxmox
Apache License 2.0
126 stars 16 forks source link

Labels not getting applied #148

Closed d2orbc closed 1 month ago

d2orbc commented 1 month ago

Labels aren't being applied. For some reason, they did get applied to one of the nodes in some past configuration. I'm not sure what changed that the other nodes aren't getting labels now.

All nodes have providerID set (as you can see from kubectl describe node talos61 output below)

Do I need to recreate these nodes? I thought the labels would be applied to existing nodes and kept updated for example if the node gets migrated to another host.

Logs

...
kube-system proxmox-cloud-controller-manager-5cdf5c9ccb-z8646 proxmox-cloud-controller-manager I0920 06:28:43.223202       1 cloud.go:82] clientset initialized
kube-system proxmox-cloud-controller-manager-5cdf5c9ccb-z8646 proxmox-cloud-controller-manager I0920 06:28:43.230275       1 cloud.go:101] proxmox initialized
kube-system proxmox-cloud-controller-manager-5cdf5c9ccb-z8646 proxmox-cloud-controller-manager I0920 06:28:43.230300       1 controllermanager.go:319] Starting "cloud-node-controller"
kube-system proxmox-cloud-controller-manager-5cdf5c9ccb-z8646 proxmox-cloud-controller-manager I0920 06:28:43.233518       1 controllermanager.go:338] Started "cloud-node-controller"
kube-system proxmox-cloud-controller-manager-5cdf5c9ccb-z8646 proxmox-cloud-controller-manager I0920 06:28:43.233540       1 controllermanager.go:319] Starting "cloud-node-lifecycle-controller"
kube-system proxmox-cloud-controller-manager-5cdf5c9ccb-z8646 proxmox-cloud-controller-manager I0920 06:28:43.233549       1 node_controller.go:165] Sending events to api server.
kube-system proxmox-cloud-controller-manager-5cdf5c9ccb-z8646 proxmox-cloud-controller-manager I0920 06:28:43.233660       1 node_controller.go:174] Waiting for informer caches to sync
kube-system proxmox-cloud-controller-manager-5cdf5c9ccb-z8646 proxmox-cloud-controller-manager I0920 06:28:43.236355       1 controllermanager.go:338] Started "cloud-node-lifecycle-controller"
kube-system proxmox-cloud-controller-manager-5cdf5c9ccb-z8646 proxmox-cloud-controller-manager W0920 06:28:43.236374       1 controllermanager.go:315] "service-lb-controller" is disabled
kube-system proxmox-cloud-controller-manager-5cdf5c9ccb-z8646 proxmox-cloud-controller-manager W0920 06:28:43.236381       1 controllermanager.go:315] "node-route-controller" is disabled
kube-system proxmox-cloud-controller-manager-5cdf5c9ccb-z8646 proxmox-cloud-controller-manager I0920 06:28:43.236494       1 node_lifecycle_controller.go:113] Sending events to api server
kube-system proxmox-cloud-controller-manager-5cdf5c9ccb-z8646 proxmox-cloud-controller-manager I0920 06:28:43.237774       1 reflector.go:289] Starting reflector *v1.Node (20h11m8.648194647s) from pkg/mod/k8s.io/client-go@v0.29.4/tools/cache/reflector.go:229
kube-system proxmox-cloud-controller-manager-5cdf5c9ccb-z8646 proxmox-cloud-controller-manager I0920 06:28:43.237795       1 reflector.go:325] Listing and watching *v1.Node from pkg/mod/k8s.io/client-go@v0.29.4/tools/cache/reflector.go:229
kube-system proxmox-cloud-controller-manager-5cdf5c9ccb-z8646 proxmox-cloud-controller-manager I0920 06:28:43.240282       1 reflector.go:351] Caches populated for *v1.Node from pkg/mod/k8s.io/client-go@v0.29.4/tools/cache/reflector.go:229
kube-system proxmox-cloud-controller-manager-5cdf5c9ccb-z8646 proxmox-cloud-controller-manager I0920 06:28:43.334804       1 instances.go:111] instances.InstanceMetadata() called, node: talos2
kube-system proxmox-cloud-controller-manager-5cdf5c9ccb-z8646 proxmox-cloud-controller-manager I0920 06:28:43.343935       1 instances.go:111] instances.InstanceMetadata() called, node: talos3
kube-system proxmox-cloud-controller-manager-5cdf5c9ccb-z8646 proxmox-cloud-controller-manager I0920 06:28:43.350976       1 instances.go:111] instances.InstanceMetadata() called, node: talos61
kube-system proxmox-cloud-controller-manager-5cdf5c9ccb-z8646 proxmox-cloud-controller-manager I0920 06:28:43.357710       1 instances.go:111] instances.InstanceMetadata() called, node: talos62
kube-system proxmox-cloud-controller-manager-5cdf5c9ccb-z8646 proxmox-cloud-controller-manager I0920 06:28:43.364401       1 instances.go:111] instances.InstanceMetadata() called, node: talos63
kube-system proxmox-cloud-controller-manager-5cdf5c9ccb-z8646 proxmox-cloud-controller-manager I0920 06:28:43.370716       1 instances.go:111] instances.InstanceMetadata() called, node: talos1
kube-system proxmox-cloud-controller-manager-5cdf5c9ccb-z8646 proxmox-cloud-controller-manager I0920 06:28:43.378913       1 node_controller.go:267] Update 6 nodes status took 44.248552ms.
kube-system proxmox-cloud-controller-manager-5cdf5c9ccb-z8646 proxmox-cloud-controller-manager I0920 06:28:59.896086       1 httplog.go:132] "HTTP" verb="GET" URI="/healthz" latency="131.347µs" userAgent="kube-probe/1.30" audit-ID="" srcIP="10.244.0.114:51404" resp=200
k[admin@cluster-talos] cluster-talos/node-management +master $ kubectl get node -o wide --show-labels
NAME      STATUS                     ROLES           AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE         KERNEL-VERSION   CONTAINER-RUNTIME     LABELS
talos1    Ready                      control-plane   29d   v1.30.3   10.0.1.11     <none>        Talos (v1.7.6)   6.6.43-talos     containerd://1.7.18   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=talos1,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=
talos2    Ready                      control-plane   29d   v1.30.3   10.0.1.12     <none>        Talos (v1.7.6)   6.6.43-talos     containerd://1.7.18   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=talos2,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=
talos3    Ready                      control-plane   29d   v1.30.3   10.0.1.13     <none>        Talos (v1.7.6)   6.6.43-talos     containerd://1.7.18   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=talos3,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=
talos61   Ready                      <none>          29d   v1.30.3   10.0.1.161    <none>        Talos (v1.7.6)   6.6.43-talos     containerd://1.7.18   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=talos61,kubernetes.io/os=linux
talos62   Ready,SchedulingDisabled   <none>          29d   v1.30.3   10.0.1.162    <none>        Talos (v1.7.6)   6.6.43-talos     containerd://1.7.18   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=talos62,kubernetes.io/os=linux
talos63   Ready                      <none>          8d    v1.30.3   10.0.1.163    <none>        Talos (v1.7.6)   6.6.43-talos     containerd://1.7.18   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=8VCPU-16GB,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=cozy,failure-domain.beta.kubernetes.io/zone=noir,kubernetes.io/arch=amd64,kubernetes.io/hostname=talos63,kubernetes.io/os=linux,node.kubernetes.io/instance-type=8VCPU-16GB,topology.kubernetes.io/region=cozy,topology.kubernetes.io/zone=noir

Environment

Name:               talos61
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=talos61
                    kubernetes.io/os=linux
Annotations:        alpha.kubernetes.io/provided-node-ip: 10.0.1.161
                    csi.volume.kubernetes.io/nodeid: {"smb.csi.k8s.io":"talos61"}
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Thu, 22 Aug 2024 01:07:26 -0400
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  talos61
  AcquireTime:     <unset>
  RenewTime:       Fri, 20 Sep 2024 02:31:08 -0400
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Thu, 22 Aug 2024 01:08:24 -0400   Thu, 22 Aug 2024 01:08:24 -0400   CiliumIsUp                   Cilium is running on this node
  MemoryPressure       False   Fri, 20 Sep 2024 02:27:13 -0400   Fri, 13 Sep 2024 02:54:31 -0400   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Fri, 20 Sep 2024 02:27:13 -0400   Fri, 13 Sep 2024 02:54:31 -0400   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Fri, 20 Sep 2024 02:27:13 -0400   Fri, 13 Sep 2024 02:54:31 -0400   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Fri, 20 Sep 2024 02:27:13 -0400   Fri, 20 Sep 2024 02:23:17 -0400   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  10.0.1.161
  Hostname:    talos61
Capacity:
  cpu:                8
  ephemeral-storage:  31500Mi
  hugepages-2Mi:      0
  memory:             32852276Ki
  pods:               110
Allocatable:
  cpu:                7950m
  ephemeral-storage:  29458694095
  hugepages-2Mi:      0
  memory:             32553268Ki
  pods:               110
System Info:
  Machine ID:                 744a686afa2c547a06a805db619b76c1
  System UUID:                7c755c32-67a7-4890-9b22-9ab156632402
  Boot ID:                    a75ffec3-16d3-422c-98de-9719e7df3799
  Kernel Version:             6.6.43-talos
  OS Image:                   Talos (v1.7.6)
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://1.7.18
  Kubelet Version:            v1.30.3
  Kube-Proxy Version:         v1.30.3
PodCIDR:                      10.244.5.0/24
PodCIDRs:                     10.244.5.0/24
ProviderID:                   proxmox://cozy/6061
Non-terminated Pods:          (8 in total)
  Namespace                   Name                                          CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                          ------------  ----------  ---------------  -------------  ---
  default                     cloudflared-deployment-84ddcb7f66-mljqs       0 (0%)        0 (0%)      0 (0%)           0 (0%)         12m
  default                     leaderboard-summarizer-66f557bdc6-w8fsh       500m (6%)     10 (125%)   18000Mi (56%)    22000Mi (69%)  12m
  kube-system                 cilium-7xqwp                                  100m (1%)     0 (0%)      10Mi (0%)        0 (0%)         11m
  kube-system                 csi-smb-node-nb4r8                            30m (0%)      0 (0%)      60Mi (0%)        400Mi (1%)     11m
  kube-system                 kube-proxy-ddq2r                              0 (0%)        0 (0%)      0 (0%)           0 (0%)         11m
  kube-system                 metrics-server-55677cdb4c-s2jsr               100m (1%)     0 (0%)      200Mi (0%)       0 (0%)         7d11h
  kubernetes-dashboard        kubernetes-dashboard-kong-7696bb8c88-t7q5j    0 (0%)        0 (0%)      0 (0%)           0 (0%)         64m
  metallb-system              speaker-xh9dn                                 0 (0%)        0 (0%)      0 (0%)           0 (0%)         10m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests       Limits
  --------           --------       ------
  cpu                730m (9%)      10 (125%)
  memory             18270Mi (57%)  22400Mi (70%)
  ephemeral-storage  0 (0%)         0 (0%)
  hugepages-2Mi      0 (0%)         0 (0%)
Events:
  Type     Reason                   Age                From        Message
  ----     ------                   ----               ----        -------
  Normal   Starting                 10m                kube-proxy  
  Normal   Shutdown                 12m                kubelet     Shutdown manager detected shutdown event
  Normal   NodeNotReady             12m                kubelet     Node talos61 status is now: NodeNotReady
  Normal   Starting                 11m                kubelet     Starting kubelet.
  Normal   NodeAllocatableEnforced  11m                kubelet     Updated Node Allocatable limit across pods
  Normal   NodeHasSufficientMemory  11m                kubelet     Node talos61 status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    11m                kubelet     Node talos61 status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     11m                kubelet     Node talos61 status is now: NodeHasSufficientPID
  Warning  Rebooted                 11m (x2 over 11m)  kubelet     Node talos61 has been rebooted, boot id: a75ffec3-16d3-422c-98de-9719e7df3799
  Normal   NodeReady                11m (x2 over 11m)  kubelet     Node talos61 status is now: NodeReady
  Normal   NodeHasSufficientMemory  7m57s              kubelet     Node talos61 status is now: NodeHasSufficientMemory
  Normal   NodeReady                7m57s              kubelet     Node talos61 status is now: NodeReady
  Normal   Starting                 7m57s              kubelet     Starting kubelet.
  Normal   NodeHasNoDiskPressure    7m57s              kubelet     Node talos61 status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     7m57s              kubelet     Node talos61 status is now: NodeHasSufficientPID
  Normal   NodeNotReady             7m57s              kubelet     Node talos61 status is now: NodeNotReady
  Normal   NodeAllocatableEnforced  7m57s              kubelet     Updated Node Allocatable limit across pods
  Warning  InvalidDiskCapacity      7m57s              kubelet     invalid capacity 0 on image filesystem
  Normal   Starting                 4m1s               kubelet     Starting kubelet.
  Warning  InvalidDiskCapacity      4m1s               kubelet     invalid capacity 0 on image filesystem
  Normal   NodeAllocatableEnforced  4m1s               kubelet     Updated Node Allocatable limit across pods
  Normal   NodeHasSufficientMemory  4m1s               kubelet     Node talos61 status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    4m1s               kubelet     Node talos61 status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     4m1s               kubelet     Node talos61 status is now: NodeHasSufficientPID
d2orbc commented 1 month ago

Oh, yeah, I created a new node in the cluster and it got the labels right away.

d2orbc commented 1 month ago

I thought the labels would be applied to existing nodes and kept updated for example if the node gets migrated to another host.

I see now this isn't true.

sergelogvinov commented 1 month ago

Kubernetes has a lot of immutable values. There is not simple way to move instance from one zone/region to another. Better to use drain/cordon technics...