Open NikolaiBessonov opened 3 weeks ago
I'm not quite sure what the question is, service-account-issuer is equal to the controlplane endpoint in Talos.
By downtime I guess you mean that service account tokens stop working? (these are used only by workload pods)
As no communication in the cluster between components will be broken if you change the endpoint, a simple pod restart for those using service accounts will be sufficient.
@smira not quite that.
For example if I put new address(vip) to endpoint on one of the controlplane node - api server stops working, because can't authenticate. I think if I change all endpoints on all my controlplane nodes, some components will require restart(such as cilium cni components. It also can't auth on controlplane with new endpoint), and it leads to downtime, until restart all components.
But if there was support of adding additional param "service-account-issuer" - where I could specify additional loadbalancer(old) on nodes, it would be without any errors and downtime. Similar to point 8 "Migration from kubeadm. Step-by-Step guide" in docs
Yes, service-account-issuer
might be done better, but I guess it has nothing to do with Cilium.
First of all, Cilium should be configured to use Talos KubePrism endpoint - that's way better than using actual cluster endpoint.
I guess what happens is that chainging endpoint re-rolls kube-apiserver certificate, and old/new certificates don't match for you, which can be solved by updading certSANs, but once again service-token-issuer should be made configurable, but your issue is something else.
@smira Sory for delayed response. We're testing it and applied to production cluster. Changing the endpoints(on all three control planes) involves restarting all pods that are connected in any way to the KubeAPI server, such as Cilium, Cert-manager, ingress-controller etc... If you have a simple script, which find all serviceAccounts and their pods, it's will be faster and take at least 1 minute of downtime. Unfortunately, this is the fastest way I found. I have no more questions. We can close the issue, but it will be better if you add the ability to do that without downtime and restart all pods
I think the issue itself makes sense, and what you would like to see is to have additional values for service-account-issuers
, that is previous controlplane endpoint - so that tokens are considered to be valid.
Bug Report
Unable to authenticate after changing cluster.controlPlane.endpoint in machineconfig
Description
After updating the cluster.controlPlane.endpoint to point to controlplane-3, authentication from kube-apiserver fails. It seems that the issue is related to the absence of the --service-account-issuer argument, which should contain the new VIP address. This issue is likely occurring because the nodes are still referencing the old load balancer endpoint, resulting in a mismatch. But there are no possibility to set up additional param.
Steps to Reproduce
Logs
kube-apiserver on the node, where you changed cluster.endpoint:
authentication.go:73] "Unable to authenticate the request" err="invalid bearer token"
Environment
Question
Until you fix it, is there any way to change cluster.endpoint?