kube-hetzner / terraform-hcloud-kube-hetzner

Optimized and Maintenance-free Kubernetes on Hetzner Cloud in one command!
MIT License
2.42k stars 371 forks source link

Allow Hetzner Volume to be reused by a different node/pod when another pod is terminating #1198

Closed Taronyuu closed 9 months ago

Taronyuu commented 10 months ago

Description

I am currently debugging an issue where one node in my four-node cluster goes offline. I've narrowed the problem down to Neo4J, which is causing a memory spike. When this happens, the node grinds to a complete halt: SSH and Netdata stop working, and Kubernetes marks it as NoSchedule and NoExecute. While the node should ideally not become unavailable in the first place, this part of the system is working as expected.

The next step is that K3S attempts to terminate the previous pod and redeploy it on a different node. This is exactly the desired 'self-healing' behavior. However, although I can see that the new pod is created, it never starts on top of Hetzner Volumes due to a multi-attach error:

Status:           Pending
IP:
IPs:              <none>
Controlled By:    ReplicaSet/mysql-6dd75d7849
Containers:
  mysql:
    Container ID:
    Image:          mysql:8.0
    Image ID:
    Port:           3306/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
...
Events:
  Type     Reason              Age   From                     Message
  ----     ------              ----  ----                     -------
  Normal   Scheduled           16m   default-scheduler        Successfully assigned project-prod/mysql-6dd75d7849-l585k to k3s-n1-fsn1-wpq
  Warning  FailedAttachVolume  16m   attachdetach-controller  Multi-Attach error for volume "pvc-3418727d-27e8-4a4b-838e-0c5b73ecacb3" Volume is already used by pod(s) mysql-6dd75d7849-b6ppv

The old pod is marked as Terminating

Status:                    Terminating (lasts 15m)
...
Events:
  Type     Reason        Age   From             Message
  ----     ------        ----  ----             -------
  Warning  NodeNotReady  21m   node-controller  Node is not ready

As far as I know, Hetzner Volumes cannot be attached to multiple volumes. However, is there a way to free the volume from the previous terminating pod? If not, is there a reason why this is not done? Additionally, is there a way to still allow the cluster to reschedule pods and 'heal' itself when a node goes offline?

StefanIGit commented 10 months ago

Hi, sorry but this sounds like an issue of your mysql deployment. Kind of sure this is outside of the scope of this project.

Taronyuu commented 9 months ago

I don't think this is a MySQL issue, the question about volumes is platform agnostic.

However, I can understand if this is not supported or recommended, if that is the case then I would like to hear it :)

StefanIGit commented 9 months ago

Hey, what I mean Hetzner Volume (hetzner-csi) does not support multi attach. from https://github.com/hetznercloud/csi-driver This is a [Container Storage Interface](https://github.com/container-storage-interface/spec) driver for Hetzner Cloud enabling you to use ReadWriteOnce Volumes within So it is not a problem this project can fix. I have no idea why the old mysql pod does not release the PV(c) when crashed/shutdown. We have the hetzner volume in an other k8s cluster running in production, and they are running fine, e.g. delete pod, it gets recreated and starts up without problems. Since you did not share more details like the mysql manifest yml, I can only speculate... maybe you do not have set valid resource limits for the mysql/Neo4J deployment. So the memory spike kills the node, when the mysql pod should be killed in this case and this would release the PVC and restart should work. bye

mysticaltech commented 9 months ago

Folks, just use the hcloud cli to detatch the volume the. reattach it to the correct node.