medik8s / self-node-remediation

Automatic repair for unhealthy Kubernetes nodes
https://www.medik8s.io/
Apache License 2.0
42 stars 16 forks source link

ARM builds for Raspberry Pi #157

Open mattibal opened 10 months ago

mattibal commented 10 months ago

Hi, I would like to run Self Node Remediation and Node Health Check operators in a cluster made of Raspberry Pis.

I see that there aren't ARM builds published in quay.io. I understand that you might not be interested to publish them for such a little use case... I could try to build it myself if there are some easy instructions on how to do that.

Has anybody tried to do this? Do you know if the operators would work well on Raspberry Pi?

mshitrit commented 10 months ago

Hi @mattibal Thanks for taking interest in medik8s ! Can you add some more details on the setup/use cases you are trying build ?

To answer your initial question: as far as I know this setup hasn't been tried yet (you've probably figured that from my question) but we are always happy to explore new things.

mattibal commented 10 months ago

Hi @mshitrit , first of all this is just something I'm playing with at home, so nothing serious :)

I want to make a 3 nodes Kubernetes cluster using K3S (not OCP), where each node is a: control-plane, worker, etcd node. The nodes are Raspberry Pi 4 (soon 5) 8GB boards, or maybe one of them could be x86 PC.

The cluster should run some single instance services as StatefulSets (something like Home Assistant, Samba) and a Rook-Ceph cluster as storage, where each node has 1 NVMe OSD. The services use ReadWriteOnce PVCs to mount Ceph RBD volumes.

Since I want the services to stay up even if 1 of the 3 nodes goes down, I think I need something that automatically adds an out-of-service taint to the failed node. Otherwise the pods that were running in the failed node, would not be automatically launched in another node until you manually add the out-of-service taint. If I understood well, Medik8s could do that. Even if I guess this is not exactly the use-case it has been designed for.

slintes commented 10 months ago

Hi @mattibal , interesting project 👍🏼

Yes, you can use medik8s for adding the out-of-service taint. Addtionally we try to reboot the unhealthy node, in order to

How are you planning to deploy NHC and SNR? Are you going to install the Operator Lifecycle Manager? AFAIK they already provide arm64 images, so it shouldn't be a problem. When you click "Install" on e.g. https://operatorhub.io/operator/node-healthcheck-operator, you will get instruction for installing OLM. Currently OLM is the only supported method to deploy our operators.

About the missing arm64 images of our operators: we don't have this on our roadmap yet, and at the moment we are prioritizing other features. But we are open to contributions by the community! This page probably is a good starting point: https://sdk.operatorframework.io/docs/advanced-topics/multi-arch/ Don't hesitate to ask further questions, we are happy to help 🙂