syself / cluster-api-provider-hetzner

Cluster API Provider Hetzner :rocket: The best way to manage Kubernetes clusters on Hetzner, fully declarative, Kubernetes-native and with self-healing capabilities
https://caph.syself.com
Apache License 2.0
606 stars 57 forks source link

Create Event+Condition if RAID has issues #1214

Open guettli opened 6 months ago

guettli commented 6 months ago

/kind feature

If there are issues with the raid, then the controller should create an Event and a Condition.

Example: Both (md0 and md1) are missing one drive ([2/1]).

The controller should tell us about this.

Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] 
md0 : active raid1 nvme1n1p2[1]
      1046528 blocks super 1.2 [2/1] [_U]

md1 : active raid1 nvme1n1p3[1]
      498400576 blocks super 1.2 [2/1] [_U]
      bitmap: 4/4 pages [16KB], 65536KB chunk

unused devices: <none>
guettli commented 1 month ago

There are many things which could get monitored on bare-metal machines.

I have the gut feeling that this makes sense, but it is not the job of caph.

But which tool owns that task?

Maybe it is better to do that in a daemonset with root privileges inside the wl-cluster.

@batistein what do you think?