Closed onedr0p closed 6 months ago
I'm pro-talos, but the lack of support on SBCs is a deal-breaker for some people. While cheap x86 machines are out there, and more people are adopting them, I feel like there's a >0 people that use this template with SBCs that are unsupported by Talos.
I know it would relieve a big burden, but I'm also nostalgic as I started with k3s on this template, so I personally have a hard time letting it go, but I'm also not the maintainer of it.
Definitely could see splitting out Kubernetes management. Kubernetes has been reusing the idea of personas with different objectives. Infrastructure Providers, Cluster Operators, Application Developers. Example diagram from GatewayAPI docs
Infrastructure Provider would be what Talos would help solve. Cluster Operators would be responsible for creating the cluster specific implementations such as kube-system/storage/networking. Application Developers would be responsible for deploying applications.
I could definitely see the cluster-template shifting its focus towards that last role. Have users bring their own cluster.
I could definitely see the cluster-template shifting its focus towards that last role. Have users bring their own cluster.
Possibly, or another repo created to handle that piece and keep this one focused on "day 1" ops and the new one focused on "day 2" ops as mentioned in the past.
I'm pro-talos, but the lack of support on SBCs is a deal-breaker for some people. While cheap x86 machines are out there, and more people are adopting them, I feel like there's a >0 people that use this template with SBCs that are unsupported by Talos.
I would be curious to hear which SBCs folks are using that Talos doesn't support. Many of the popular ones seem to have support but I've never run an SBC cluster so I'm not very familiar with this.
I've personally used k3s in the past (with the now deprecated k3os) and my experience redoing my cluster more recently with Talos is that it is much simpler to bootstrap and manage the cluster but I'm also early on in my new cluster so I'm not yet aware of the limitations Talos brings aside from no SSH access.
I think it would be very useful to hear from anyone that's attempted to use Talos but was unable about what limitations they faced.
@alexwaibel You have a point, Talos has deprecated all SBCs and are now relying on the community to provide support. I think once that is iterated on and the Pi5 is supported by the community it might be more tangible to support only Talos here.
I would be curious to hear which SBCs folks are using that Talos doesn't support. Many of the popular ones seem to have support but I've never run an SBC cluster so I'm not very familiar with this.
I can only speak for myself, but RPi5 support is lacking, due to the deprecation of SBC support and community overlays. Therefore, I'm still using the template to run k3s on those devices.
I'm using Raspberry Pi CM4 compute blades with Intel NUCs for my cluster. Still using an old version of the template from before the rewrite.
I've got another 10 blades on order so when they arrive I'll be nuke & rebuilding to check support in Talos for my blade hardware.
I'm a strong supporter of keeping K3s as an option. While it may be more of a pain in the rear, Talos is, at least in my eyes, far less user-friendly when it comes to getting new people on board. This is a project I point people who want to get familiar with K8s to a lot, due to how good it is at getting a stable foundation set up. To have that taken away, at least in my eyes, would be a large blow to those who are less familiar.
One thing that slightly worries me is indeed the support, if I take Longhorn for example. It took a long time for Longhorn to add support for Talos. Without that support it was impossible to run Longhorn on Talos, and you were forced to go with something else.
Now longhorn is getting a V2 engine (preview feature at this time). This does mean they have implement things on their side to make it supported / working again. Not sure what their priority would be for this, and/or how long we have to wait for it (it's currently marked for milestone 1.8.0 while we are at 1.6.1 atm).
While I understand that it's not a 1-1 integration with any other Linux, it does feel bad people might have to run on older versions of particular software, or even unable to run particular software of their choosing because it has no Talos support, unlike linux with k3s that usually works out of the box.
On the flip side of this, Talos has quite some benefits over running a linux OS with a flavor of k8s on it. Not even talking about the configuration part that now needs to be maintained to keep k3s running.
I'm pro and con removing k3s for the above reasons.
Thanks for the feedback so far everyone... nothing will happen yet and this template is pretty stable. I really would like to see Talos get support for RasPi5 and success stories with Longhorn before making the ultimate decision of solely focusing on Talos.
I'm pro talos. I've installed in on my mini pc's (intel 100's) and it works great. I've was skeptical about talos os because it is immutable. But I realized I don't even want to do anything else on those pc's.
ofcourse most people probably have sbc's, so I'm a minority
I'm pro Talos, too. I just switched from Debian and k3s to Talos on my vm cluster and it worked out of the box. I just had to adapt the talosconfig.yaml for the use with longhorn. So far no probs, no hurdles, no downtime. Being deeply into infosec I also like the tremendous plus of security offered by using immutable os.
I've always liked immutable boot systems like Talos (previously used SmartOS which was similar on boot), but in home labs people do seem to use a variety of hardware that is missing Talos drivers. Kubesearch shows a decent amount of people running Frigate, which is frequently run on a node with GPU support for video encode/decode, and a Coral USB TPU for inference. There's no Coral USB drivers for Talos (they do have a PCIe Coral driver though), and I'd presume drivers for newer APU's and iGPU's will be similarly behind.
If the goal is just a home lab to learn and experiment with, then Talos seems fine to accept those limitations. But if this is a starting point for people to run home workloads across nodes (which does seem to be what's happening), than the broader hardware support of Debian/k3s seems hard to leave behind.
There's no Coral USB drivers for Talos (they do have a PCIe Coral driver though)
Talos doesn't need drivers for the USB, I'm using Frigate with the Coral USB works fine on my cluster. In fact any USB device should just work.
Similarly, I don't need drivers for my Talos USB zigbee device.
Well that certainly addresses all my needs. š Sorry for the noise.
Here's my two cents. I belong to the group of users who started with Single Board Computers (SBCs) independently and later discovered this project/community. Currently, I'm part of the Talos group and although SBCs are becoming less of an option, I believe that k3s is a better starting point than Talos.
Perhaps, we could consider a compromise solution by simplifying the k3s version and presenting it as an entry point to Flux and K8s. I think that most people (forgive the generalization) who use this project are individuals with a lot of enthusiasm and skills. Even though the technical part with good documentation wouldn't be a hurdle, the fact that Talos doesn't offer the same support as k3s for SBCs could be a barrier to this project's growth. However, I understand the difficulty of maintaining both configurations. Thanks a lot!
Best regards!
Actually, I think we should change our thinking.
onedrop should use whatever he feels comfortable with. He makes this available to us, so his opinion is the biggest factor.
But if anyone is interested, they can maintain a k3s branch.
Thanks all for the feedback. I have opened a PR for supporting only Talos, this will make this project a lot easier to support moving forward.
@onedr0p I know it's too late now. but you asked for feedbackā¦ I just wanted to start with your project, but turned around on the doorstep, b/c it's unusable now. since my x86-projects and installations are all leaned toward AI/GPU use and talos stated: here
To be clear, Talos will not support the gpu-operator since it wants to manage building and loading modules which is not going to be supported by Talos
doneā¦ bye
since AI will be everywhere tomorow, and as OSS enthusiats "we" tend to run the stuff ourselfs, talos will not let you do AI-GPU-use for containersā¦ so kicking out k3s, IMHO, is a strategic decision against AI/GPU use on the clusters
another aspect, which I was missing in the feedback above: talos maybe be good for BM/VM k8s-cluster only. In my usecase, all my SBCs have multiple jobs to do, so they need a Linux, preferable debian ;) and additionally, non of the hardkernel SBCs is supported.
I'm looking out for something elseā¦
@chymian Feel free to use a previous version of this repo, as far as I know everything still works for k3s right up until before the commit that removed support.
https://github.com/onedr0p/cluster-template/tree/f4eb701ac6b5ccc0d336c41bd63c5a545ccb575e
Or maybe use one of the alternatives I listed in the README.
@onedr0p I know it's too late now. but you asked for feedbackā¦ I just wanted to start with your project, but turned around on the doorstep, b/c it's unusable now. since my x86-projects and installations are all leaned toward AI/GPU use and talos stated: here
To be clear, Talos will not support the gpu-operator since it wants to manage building and loading modules which is not going to be supported by Talos
doneā¦ bye
since AI will be everywhere tomorow, and as OSS enthusiats "we" tend to run the stuff ourselfs, talos will not let you do AI-GPU-use for containersā¦ so kicking out k3s, IMHO, is a strategic decision against AI/GPU use on the clusters
another aspect, which I was missing in the feedback above: talos maybe be good for BM/VM k8s-cluster only. In my usecase, all my SBCs have multiple jobs to do, so they need a Linux, preferable debian ;) and additionally, non of the hardkernel SBCs is supported.
I'm looking out for something elseā¦
Itās not quite as dire as it seems. They do have a system extension to install the kernel modules, and a way to register the runtime class. You can also consider the container device interface as a future-looking way of binding a device to a pod. Iām running a cluster with the operator running with driver management disabled to run LLM training and inference at home and itās working fine. Itās not as flexible as having the whole operator functionality, sure, but itās very much usable.
since AI will be everywhere tomorow
I'll keep an eye out on Friday, but I'm pretty sure that's not the case. I have zero nodes that could even fit a GPU, let alone SBCs that could run LLMs.
This is also a template to just get started. There's nothing stopping you from running your own k3s, talos, k0s, or any other cluster.
doneā¦ bye
Don't let the door hit you on the way out? Not sure why you need to come across so aggressively on a free passion project.
Getting Talos in here has been a big win and everyday I use it I am more impressed with it. I've been pondering dropping k3s support so that I can 100% focus on the tool I am currently using in my homelab (Talos).
The benefits of k3s include the ability to use SSH and any system and tweak/debug Debian easily as needed. However there are some downsides to using k3s that I have experienced in this type of setup and I'll sum them up below.
kubelet
,kubeApiServer
,kubeProxy
,kubeControllerManager
,kubeScheduler
on the same endpoint which makes integration withkube-prometheus-stack
annoying to work with since you need to drop duplicate metrics on theserviceMonitors
or else Prometheus's memory will balloon to 3x-5x what it should be.HelmChart
isn't great either as there is some nuances to make Flux take it over.The downsides for Talos are a thing too, the two biggest ones are people need to learn
talosctl
to interact with nodes since there is no SSH and Talos might not support all hardware natively or thru their system-extensions (e.g. SBCs are now only community-supported)Overall I think it would benefit to go back to supporting one Linux distribution (in this case Talos) so that testing and the maintenance burden on myself can be more achievable.