siderolabs / cluster-api-bootstrap-provider-talos

A cluster-api bootstrap provider for deploying Talos clusters.
https://www.talos-systems.com
Mozilla Public License 2.0
103 stars 27 forks source link

Generated talosconfig contains VIP as endpoint, leading to talosctl operation failures #161

Closed magicite closed 1 year ago

magicite commented 1 year ago

See siderolabs/talos#6796.

The talosconfig generated by cabpt for my clusters with a VIP, include the VIP as an endpoint. This causes numerous issues when using the talosconfig in conjunction with talosctl, such as using the ugprade-k8s command (see referenced issue).

I believe the VIP should just be ignored for purposes of generating the talosconfig file.

smira commented 1 year ago

CAPI is not smart enough to guess which address is which.

CABPT takes all the addresses of the Machines. It can't guess which one is good or not as the endpoint.

In your case you're probably using Sidero actually, and that reports the VIP as the address as well. So if any, it should be Sidero which should report it different way.

magicite commented 1 year ago

Yes, this is in a cluster generated using CAPI with sidero, cacppt, and cabpt providers in place. Should I open an issue in siderolabs/sidero?

smira commented 1 year ago

I'm not positive it's an issue.

At some point, talosconfig had no endpoints when generated by CABPT, and users had to set them manually.

We added automatic endpoints to talosconfig, and now we have an issue that some IP should not be listed as an endpoint.

I don't think automatic endpoints will ever be completely accurate, as machines might have multiple networks, private, public, etc., and CABPT has no way to know which of them are good as endpoints. Same way, nodes might have BGP addresses which work like VIPs, but are transparent to Talos.

So my point is that automatic endpoints provide some way to get a talosconfig which is good enough to start interacting with Talos API, but if one needs a robust solution with machines which have more than a single address, talosconfig should have endpoints set manually based on the actual knowledge of the network.

smira commented 1 year ago

It's not hard to set endpoints if you know what should be the endpoints in your network, kubectl get machines for a control plane, filter .status.addresses, build a list and use talosctl config endpoints to update endpoints to the desired value.

magicite commented 1 year ago

Makes sense. I was hopeful for a more automated solution to the problem, but I can see why that would be non-trivial. Perhaps then I'd recommend a note at Retrieve the talosconfig directing the user to audit the file and remove the VIP and any other invalid IPs that might appear.