siderolabs / talos

Talos Linux is a modern Linux distribution built for Kubernetes.
https://www.talos.dev
Mozilla Public License 2.0
6.45k stars 514 forks source link

docs: KubePrism is actually not enabled by default on existing clusters #8179

Open yoctozepto opened 7 months ago

yoctozepto commented 7 months ago

Bug Report

Despite the (now misleading) release notes and docs for 1.6 - the KubePrism is not enabled by default - it is only enabled by default for new clusters, i.e., the ones with new config from CLI. The original PR https://github.com/siderolabs/talos/pull/7788 also incorrectly dropped the relevant docs showing how to enable the KubePrism as they are still needed for existing clusters.

I suggest release notes and docs should be updated to reflect this actual meaning of this change.

smira commented 7 months ago

You can use 1.5 docs to see how to enable KubePrism, but yes release notes need to be corrected. Talos never enables any features on upgrade.

yoctozepto commented 7 months ago

Hmm, but the docs for 1.6+ should not drop the instructions for configuring KubePrism as they are still very valid...

brujoand commented 5 months ago

I've just setup a new cluster with installer version '1.6.7' and talosctl version 'v1.4.8'. I was not allowed to add machine.features.kubePrism using 'talosctl patch'. Instead I had to manually edit the config or patch the node directly. Without this there was no kubePrism.

smira commented 5 months ago

Please use same version of talosctl with Talos clusters. Out of sync versions might work wrong way.

brujoand commented 5 months ago

Yeah that sounds right. The only installation instructions I could find at the time were pipe to bash

reading through that script (as I should have done before executing it) shows that the talosctl binary is actually neatly released with everything else. So my contribution here was user error and thus off topic for this issue.

cehoffman commented 2 months ago

Maybe I'm missing something, but I have a cluster that started on 1.4 (forget patch), but has been upgraded through 1.5.{3,4,5}, 1.6.7, and now 1.7.5. I hadn't had a chance to use kubeprism yet, but in looking at some cluster configuration maintenance came across changing cillium to use localhost:7445 for the API server. However, when trying this the components are not able to make a connection. I can use localhost:6443 on the control-plane nodes, but that isn't kubeprism and is the actual api-server pod.

I'm unable to set the kubeprism feature on the machine config and cycling in new machines doesn't result in them having kubeprism active. For example, one machine which had gone through these upgrades still had the stable ifname disabled since that was a post 1.4 change. Reseting this machine and applying config results in the ifname change taking effect, but kubeprism is not active (at least nothing listening on 7445). The machine config setting kubeprism is rejected I expect because it should be defaulting now to on. Kubeprism however is not on and there is no way to enable it on upgraded clusters.

smira commented 2 months ago

Talos Linux never enables new features automatically on upgrade if these are configurable. So upgrades are safe in the sense that it's less surprising.

When upgrading, you can look through the release documentation to figure out how/when to enable new features (if you'd like to enable it): e.g. KubePrism.

When the docs say that it's enabled by default for new clusters, it means that machine configuration generated for Talos version >= X now enables this feature explicitly in the machine configuration.

cehoffman commented 2 months ago

Thank @smira for the response. The point of my post was not to say that it should have been automatically enabled. I was under the impression, both from my own experience trying to enable it post upgrade and a previous post in this issue, that it could not be enabled. Since it doesn't show up by default in the machine configuration and I went to the linked PR that removed the enablement documentation, I was using the wrong configuration field. The PR has the example as kubeprism for the feature when it is actually kubePrism. The published docs for 1.5 has the right value.

Screenshot 2024-07-01 at 6 51 19 AM

The lack of the current value in machine config seems like a reproduciblity miss. If one was to try and recover a cluster starting from an etcd snapshot and the machine configs, would that not result in kubeprism being enabled then since the machine configs are lacking the now non default value?

smira commented 2 months ago

The lack of the current value in machine config seems like a reproduciblity miss. If one was to try and recover a cluster starting from an etcd snapshot and the machine configs, would that not result in kubeprism being enabled then since the machine configs are lacking the now non default value?

The machine configs default values don't change over time, so there's no reproducibility problem.

If the documentation is wrong, please send a PR.