RFE: Constrained deployment scenarios (Edge)

timothysc commented 4 years ago

This is a FEATURE REQUEST?

Currently we do not advertise the minimum target environments where we support kubeadm and do testing on. The purpose of this issue is to:

advertise the min-target
setup CI-signal for constrained target
trace and eliminate bloat in daemons (there is a lot)
Channel the community back towards k8s proper vs. forks

Versions

kubeadm version (use kubeadm version): *

Environment: Constrained arm-like environments <2GB of memory and limited CPU.

What happened?

k8s-edge deployment scenarios with a full control. a.k.a - kubernetes power toaster.

What you expected to happen?

We should have CI and it should just work.

/cc @wojtek-t - to help trace the bloat and weird perf issues.

BenTheElder commented 4 years ago

xref: https://github.com/kubernetes-sigs/kind/issues/485

Infrequently I check KIND in docker for mac's minimum settings (~1GB ram .5GB swap, 1 CPU IIRC), we could definitely do better.

One resource that doesn't get much attention (and tbh I don't really expect to but...) for cheap / underpowered environments is disk I/O ... xref: https://github.com/kubernetes-sigs/kind/issues/845

timothysc commented 4 years ago

We used to add a lot of caching which is totally un-necessary if you bin-smash etcd into the api-server.

Also init routines are the devil.

khenidak commented 4 years ago

@timothysc there is a use case for a completely stateless clusters. edge clusters that connect to a command and control cloud or so to download what needs to be running etc might be a candidate for bin smashing etcd into api server with no storage at all

just thinking out loud..

fejta-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

warmchang commented 4 years ago

/remove-lifecycle stale

neolit123 commented 3 years ago

had a discussion with @timothysc about this today.

i tried googling what people are doing and found some interesting results:

this article suggests that kubeadm on a recent RPI "just works" without any hacks on the kubeadm side: https://opensource.com/article/20/6/kubernetes-raspberry-pi (recommends 4GB RAM, but i'm sure the control plane can work win <2GB too)
Alex Ellis blog that endorses k3s over kubeadm: https://blog.alexellis.io/test-drive-k3s-on-raspberry-pi/ one main argument seems to be the k3s download size being only 40MB and includes, bootstrapping, control-plane components, storage, containerd and kubectl. i don't see how that is possible without forking projects and trimming away features from them. that might be great for end users, but not so great for maintainers. also Alex points out that k3s just worked, the first time.
kubeedge which is becoming the leading framework for kubernetes on edge, uses kubeadm for its guide: https://docs.kubeedge.io/en/v0.2.1/setup/setup.html
this guide from 2018 also mentions that kubeadm works, with some caveats around image pull and preflight checks: https://elastisys.com/kubernetes-raspberry-pi/ (those issues should be resolved at this point).

also in the past few years i've seen more users mentioning the RPI with kubeadm just works for them, transparently. this could have been due to a number of factors including fixes around Go on ARM and k8s control-plane fixes that went under the kubeadm maintainer radar.

what i'd would like to gather more details about here is the following:

"edge" is a very broad term...what are the specs for a device where kubeadm does not work on?
what additional flags/customizations for components have users applied to make their kubeadm cluster work better on such devices?

afbjorklund commented 3 years ago

I did some home experiments over the summer, on different versions of the Raspberry Pi (from 0 to 4)

Earlier I had been using kubernetes 1.10 on Raspberry Pi 2, so that was my baseline - from years ago.

My conclusions:

The raspberry pi 0w and original 1 were OK for running containers on, but not really for kubernetes (at all). They run OK with Docker Swarm or Hashicorp Nomad, as long as you can still find armv6 binaries. So while easy to find, and rather cute in the case of the Zero, I don't think these are enough anymore. There are some hints about a future light version of k3s, that might be able to target these boards.

The raspberry pi 2 runs much better with k3s, mostly due to the available 860M - not enough to run k8s. I was able to get k3s (only) to boot on raspberry pi 0w too, but it required using zram (compressed swap) Since it runs linux/armv7, you can just run regular arm container images - without having to rebuild. The performance is not great, but on the other hand cooling is much easier - they don't really need any.

The raspberry pi 3 and 4 runs regular k8s, mostly out of the box. I made it harder by using arm64. When you get the later model you also get enough memory (2G), to not have to worry about any swap. By default Raspberry Pi OS will use arm for compatibility, even though it has a armv8 CPU available. These models are much faster, but they also get much hotter. So some cooling becomes necessary.

More details:

I made my own custom Linux distribution, and then did my own custom Kubernetes distribution... It has benefits when it comes to the footprint, but also some issues when it comes to upgrading. Was a continuation/replacement of my work on docker-machine/podman-machine, now deprecated. I wrote about it on the blog: https://boot2podman.github.io/2020/07/10/proof-of-concept-release.html

Community:

These people have some great resources about running k8s@home, but usually in the "rack scale". Like building custom cases and overclocking and so on. Not really the simple cluster from "Appendix A"*.

https://github.com/Raspbernetes

* that would be from the book "Kubernetes: Up and Running": "Building a Raspberry Pi Kubernetes Cluster"

This thesis is still the best motivation that I have seen of why you should build a Raspberry Pi cluster. There are some similar motivations in the book above as well, beyond it being both affordable and fun!

https://kubecloud.io/

And this is of course the original distribution, less needed now when everyone supports ARM but anyway. Also much easier now that docker is available as a regular package in Raspbian (and in Ubuntu ARM too)

https://hypriot.com/

timothysc commented 3 years ago

There are a number of optimizations that could be done if we start to entertain:

Bin-smashing control plane components into a single component and remove caching and overhead. (agent-worker model). This is a massive amount of work but I think could be done in stages starting with etcd<>apiserver. There is really no reason we can't bundle them into a single binary and simplify the deployment and reduce a large amount of overhead.

Hard audit on the kubelet overhead to reduce startup costs.

afbjorklund commented 3 years ago

Bin-smashing control plane components into a single component and remove caching and overhead.

This is what k3s is doing, busybox type of binary. That and moving from etcd to sqlite are the biggest wins.

timothysc commented 3 years ago

I think we need to stay true to kubernetes for a number of reasons. sqlite may be fine for small environments but it violates the core reason why etcd was chosen.... CP storage. So I think there is a lot of low hanging fruit that we can tackle.

afbjorklund commented 3 years ago

Here was the difference in installed size: (from the distribution mentioned above)

k8s (arm64): 487M rootfs + 997M images k3s (armv6): 269M rootfs

So it is 50% of the OS, but 25% of the total... (The arch difference might skew it a little too)

The 40M binary is because it is self-extracting. (That is, it is actually the compressed size of k3s)

With k3s being accepted into the CNCF, it is now a question of Deployment form factor rather than a fork. More like features missing (alpha/beta and in-tree and cloud and such), and a different type of database ?

There was originally a similar such project in minikube (called "localkube") that served the same purpose. But the extra optimizations takes this one down to 50% of that, so it's like 2G vs 1G vs 512M of memory.

Anyway, that's how I ended up with selecting k3s for the Raspberry Pi 2 - but k8s for the Raspberry Pi 3.

As mentioned, k3s also has lots of "batteries included" that could be looked upon as "inspiration" for k8s.

Simple but powerful “batteries-included” features have been added, such as: a local storage provider, a service load balancer, a Helm controller, and the Traefik ingress controller.

Having similar features in e.g. kubeadm would lessen the need for minikube start --driver=none etc.

neolit123 commented 3 years ago

@afbjorklund thank you for the useful details. you seem to have a lot of experience with running Kubernetes on RPI.

as someone who is not a RPI user, looking at this: https://socialcompare.com/en/comparison/raspberrypi-models-comparison

and WRT kubeadm system-reqs, i can see one can buy a 35$ board ( Raspberry Pi 4 B) that should run a kubernetes control-plane and then for worker nodes 10$ boards (such as Raspberry Pi Zero W) should suffice for non-heaving workloads.

usability of kubeadm vs k3s aside, i'm wondering how much picking k3s over kubeadm here is the result of budget vs "i have some old boards lying around that i want to run a cluster on"

also something that i'm very curious about is, can you explain why did you need to run Kuberentes on RPI, other than experimentation? do you have real world examples from other users?

Having similar features in e.g. kubeadm would lessen the need for minikube start --driver=none etc.

kubeadm has the principle of doing a minimal viable cluster with only required components and deviating from stock k8s as less as possible. instead of deviating from a component default value, we actually try to go to the component maintainers and try to push changes in the defaults if the defaults are not sane.

afbjorklund commented 3 years ago

@afbjorklund thank you for the useful details. you seem to have a lot of experience with running Kubernetes on RPI.

Thank you! There are of course lots of other little ARM details, and also it has changed over the Kubernetes years...

I recommend the summary from Alex Ellis: https://www.raspberrypi.org/blog/five-years-of-raspberry-pi-clusters/

usability of kubeadm vs k3s aside, i'm wondering how much picking k3s over kubeadm here is the result of budget vs "i have some old boards lying around that i want to run a cluster on"

I got the feeling that kubeadm sort of "gave up" on raspberry pi (around 1.10), and that k3s has revitalized it again. But I was also running with cri-o instead of containerd, so a lot of it was also simply about trying out the alternatives.

I did some presentations on it last year: https://boot2podman.github.io/2019/02/19/containers-without-docker.html https://boot2podman.github.io/2019/04/09/boot2podman-kubernetes.html if you want to know more (boards were new).

also something that i'm very curious about is, can you explain why did you need to run Kuberentes on RPI, other than experimentation? do you have real world examples from other users?

Well, as I was trying to explain (and link) above there is for me not so much the need but that it can be done. If you just want to get started, it is much "easier" to just run virtual hardware instead. So I do that as well...

I know that Rancher has a lot of customers running lots of small clusters, and they talked about it on Kubecon. Most of it is about being able to use the same tools on the edge, that they are already using in the cloud ?

The actual reason I started was that Ops wouldn't allow Docker for security reasons, and Dev were running Swarm.

So I started with running Kubernetes on virtual machines, and then built my own physical cluster on a Hackathon...

Having similar features in e.g. kubeadm would lessen the need for minikube start --driver=none etc.

kubeadm has the principle of doing a minimal viable cluster with only required components and deviating from stock k8s as less as possible. instead of deviating from a component default value, we actually try to go to the component maintainers and try to push changes in the defaults if the defaults are not sane.

I guess I meant more "addons" than actual code or configuration changes.

But maybe that is outside the scope, and better handled by someone else.

As far as I can tell, Rancher (now SUSE) is doing the right thing. Their distribution is certified and patches go upstream. It is at least not worse than what Red Hat is doing with OpenShift, but I wouldn't call either of them "vanilla kubernetes"...

If you look at the latest projects from this year, I gave up on tinkering with podman and k3s and just went with docker and k8s. It does double the requirements a couple of times - but edge hardware is also improving (like for instance the Raspberry Pi 4)

But now the summer is over, and the projects are all "done".

Most likely I will be doing something different, this Hacktober.

afbjorklund commented 3 years ago

WRT kubeadm system-reqs, i can see one can buy a 35$ board ( Raspberry Pi 4 B) that should run a kubernetes control-plane and then for worker nodes 10$ boards (such as Raspberry Pi Zero W) should suffice for non-heaving workloads.

You don't want to be running Raspberry Pi Zero W (or 1), unless you want to build your own images... They are running linux/armv6, and the default "arm" images will be for linux/armv7 and won't run.

When it comes to the Raspberry Pi 2, 3, and 4 they all cost the same ($35) so there are "other factors". They have different performance vs. heat generation characteristics, and different USB generations, etc.

I went with model 3B for "arm64", as a trade-off.

Probably around $250, for a four-node cluster ?

I think that kubeadm should keep armv7 and 1 GB of memory as a minimum, to exclude the older models. Running on armv6 and 512 MB requires tradeoffs that perhaps won't be acceptable to the main project...

Even running on armv7 with 1 GB will be a compromise, and might require enabling swap (on the master). Running on armv8 with 2 GB memory (Pi 4) is probably the best, since it will be more like a "real" cluster ?

I would probably just leave the "single binary" and "etcd alternatives" up to k3s - since it is already available ? This also happened in minikube, when support for "localkube" bootstrapper was dropped in favor of "kubeadm".

It doubles requirements (to 2 CPU and 2 GB), but makes it easier for the maintainers to not have a custom distro. Having to support both "arm" and "arm64" is annoying, but probably very much needed - at least for the short term.

Sticking with a 32-bit userland has the benefit that the same image will run on every board from a 2011-era alpha board to today’s shiny new 8GB product.

neolit123 commented 3 years ago

usability of kubeadm vs k3s aside, i'm wondering how much picking k3s over kubeadm here is the result of budget vs "i have some old boards lying around that i want to run a cluster on"

I got the feeling that kubeadm sort of "gave up" on raspberry pi (around 1.10), and that k3s has revitalized it again. But I was also running with cri-o instead of containerd, so a lot of it was also simply about trying out the alternatives.

the interest was reduced around that time, mostly due to the lack of CI resources. if we were able to get a way to request ARM VMs this would have been great and we could have said that ARM is truly supported.

nowadays we have ARM (the company) approaching Kubernetes looking for running clusters in CI, to more officially claim the support. similar goes for IBM and their hardware.

i guess, we could also try QEMU and see how that goes, but instead we prefer the vendors to take action here and own their support.

I recommend the summary from Alex Ellis: https://www.raspberrypi.org/blog/five-years-of-raspberry-pi-clusters/

something that Alex's blog is not talking about is all the weird bugs we saw running kubeadm / k8s on RPI reported by users.

race conditions in the kubelet (maybe this is the "timeouts" he mentions) - causing liveness probe timeouts not reproducible on slower setups (as the argument is that this is caused by slower setups). there was a long discussion about this RPI users pressuring kubeadm to change its liveness probe settings.
potential Go compiler bugs. on weird one that i was debugging - a kubeconfig is loaded from disk, unmarshaled into a structure and a few instructions later a value is nil and there is no explanation in the source code why that can happen.

also something that i'm very curious about is, can you explain why did you need to run Kuberentes on RPI, other than experimentation? do you have real world examples from other users?

Well, as I was trying to explain (and link) above there is for me not so much the need but that it can be done.

this is the usual case that i've seen. K8s on RPI was more of a hobby project, not that that is an argument to not support the use case.

I know that Rancher has a lot of customers running lots of small clusters, and they talked about it on Kubecon. Most of it is about being able to use the same tools on the edge, that they are already using in the cloud ?

Rancher has interesting claims in this area. Edge is very broad and minimal device specs vary a lot. the other day i emailed some experts and i was told the kubeadm system requirements are quite normal for an Edge / IoT machine in terms of business ideas these days...

As far as I can tell, Rancher (now SUSE) is doing the right thing. Their distribution is certified and patches go upstream. It is at least not worse than what Red Hat is doing with OpenShift, but I wouldn't call either of them "vanilla kubernetes"...

i think we are yet to see Rancher patches in Kubernetes. there might be contributions here and there but here are some stats: https://k8s.devstats.cncf.io/d/8/company-statistics-by-repository-group?orgId=1&var-period=d7&var-metric=contributions&var-repogroup_name=All&var-companies=%22Red%20Hat%22&var-companies=%22Google%22&var-companies=%22IBM%22&var-companies=%22VMware%22&var-companies=%22Intel%22&var-companies=%22SUSE%22&var-companies=%22Rancher%20Labs%22

neolit123 commented 3 years ago

You don't want to be running Raspberry Pi Zero W (or 1), unless you want to build your own images... They are running linux/armv6, and the default "arm" images will be for linux/armv7 and won't run.

When it comes to the Raspberry Pi 2, 3, and 4 they all cost the same ($35) so there are "other factors". They have different performance vs. heat generation characteristics, and different USB generations, etc.

I went with model 3B for "arm64", as a trade-off.

Probably around $250, for a four-node cluster ?

that is quite reasonable.

Running on armv6 and 512 MB requires tradeoffs that perhaps won't be acceptable to the main project...

yes. unlikely we can get support for CP nodes on 512 RAM.

Even running on armv7 with 1 GB will be a compromise, and might require enabling swap (on the master).

it may work. it depends...from what i've seen once you enable some work on 1 GB RAM CP node, the CP components would start crashlooping and that's not really kubeadm's fault at that point.

k8s needs optimizations in some areas and nobody is sending contributions for those. users just report them, "api-server consumes a lot of CPU doing X, kubelet consumes a lot of RAM doing Y", but the patches are never sent and the maintainers don't have this as a high-priority.

I would probably just leave the "single binary" and "etcd alternatives" up to k3s - since it is already available ?

after some recent feedback that i got, i'm leaning towards -1 to the binary smashing idea, not because k3s already exists but rather i'm against the practice in general and it has to be proven that the maintenance burden is really justified.

if that is what the community and users want we can enable this custom howebrew bin-smashed distribution under a kubernetes-sigs repository and this SIG can help with the governance. my only requirement there would be to not fork anything and first provide some stats / numbers how much of an improvement the idea is in the proposal phase.

also i don't see the kubeadm maintainers wanting to maintain something like that. so if the same community and users don't step up to maintain it, i don't see it happening.

afbjorklund commented 3 years ago

the interest was reduced around that time, mostly due to the lack of CI resources. if we were able to get a way to request ARM VMs this would have been great and we could have said that ARM is truly supported.

nowadays we have ARM (the company) approaching Kubernetes looking for running clusters in CI, to more officially claim the support. similar goes for IBM and their hardware.

Having some CI machines with arm64 would be great, it would expand the "coverage" beyond amd64 ?

I have found the qemu arm support to be a bit buggy, it mostly works but there are some strange crashes For instance compiling go programs under qemu does not work*, so it needs cross-compiling or hardware

* https://github.com/boot2podman/boot2podman/issues/18#issuecomment-540780440

I made a similar request for minikube, but no takers so far (though more cloud providers have arm now) My outline was just a for a simple $100 "under the desk" set up, it would be better with some server racks

* https://github.com/kubernetes/minikube/issues/6280#issuecomment-573312349

But I don't know anything about the kubeadm project CI, maybe it is better than the minikube project CI...

neolit123 commented 3 years ago

Having some CI machines with arm64 would be great, it would expand the "coverage" beyond amd64 ?

Yes, afaik the ci is amd64 only still.

uablrek commented 3 years ago

Do not forget the CNI-plugin

On a small system some CNI-plugins must be avoided, e.g. cilium which is a horrible consumer of both CPU and memory.

neolit123 commented 3 years ago

FWIW, Rancher's kine project (which is a shim for the etcd API) now includes an example of how to setup kubeadm with external etcd mode and use mysql as the storage backend: https://github.com/rancher/kine/blob/master/examples/minimal.md#using-with-kubeadm

neolit123 commented 2 months ago

i experimented a little with building a minimal binary of kubeadm + all shipped k8s components and it was around 200 MB. that did not include etcd + coredns. it is technically possible to build an image that is ALA hyperkube and a binary that exposes all components as subcommands but there has not been recent demand for k8s to host such a project. k3s seems to fit most users, but it's still considered partly a fork.

we can revisit in the future.

kubernetes / kubeadm