Open Hacksawfred3232 opened 1 month ago
It looks like there are two parts here:
you can (and should) today if using custom TrustedRootsConfig
on initial bootstrap to connect to Omni, have the same TrustedRootConfig
as a config patch in Omni to keep it active; Omni "overwrites" machine configuration as it performs installation/attached the machine to the cluster.
you can (and should) supply the initial bootstrap configuration to the vanilla Talos image via "user-data" mechanism: e.g. if running on AWS, you can use AWS user-data; if running on bare-metal you can use Talos' talos.config=
kernel argument.
Problem Description
This is most likely already a checklist item internally, but I'll go ahead and submit this anyway.
In version 1.8+ of Talos Linux, a new manifest called "TrustedRootsConfig" was added, which allows custom CA certificates to be implemented. In an issue I opened in the Talos Linux repo, I thought that the implementation of this was broken, as I thought a plugin was overwriting the certificates. However, investigating further by using Talhelper to establish the cluster instead of Omni revealed that Omni was actually at fault, since it sent a completely new config to the target machines, overwriting everything that was already there. A fact that I should have picked up on in hindsight, since I did check the "STATE" and "META" partition of one of the nodes in the cluster to see what was being written into the config, and should have noticed that the other manifests were there, such as the Siderolink and log sink parameters, but not the "TrustedRootsConfig" manifest.
Not having the "TrustedRootsConfig" manifest on nodes means that a cluster managed by Omni will not work inside an network where using an internal CA for internal services is required, and using Lets Encrypt or similar internally is disallowed by either head of IT or upper execs.
Solution
The most likely solution would be to inform the Omni server of what additional CA certificates are being used in the network, so Omni can then use those certificates in a "TrustedRootsConfig" manifest that gets sent to the nodes alongside the main config. Either through a command-line parameter, or for feature parity between SaaS and self-hosted, a config option on the Omni control panel.
This won't help with initial bootstrapping of nodes to Omni (on self-hosted), but I've worked around that issue by using a separate HTTP server that supplies a boot-time config file containing the Siderolink parameters plus a "TrustedRootsConfig" manifest that allows a node to connect to Omni for the first time. Omni can possibly fulfill that role as well, by allowing the user to either choose between having the parameters supplied as boot parameters or having them served over plain HTTP using the "talos.config" boot parameter, though some ACLs would have to be used for the later to prevent unauthorized access.
Alternative Solutions
No response
Notes
No response