rancher / elemental

Elemental is a software stack enabling centralized, full cloud-native OS management with Kubernetes.
https://elemental.docs.rancher.com/
Apache License 2.0
296 stars 39 forks source link

Automatically add the additional-ca defined in Rancher to the OS cabundle for trusted certificate authorities #1190

Open Martin-Weiss opened 7 months ago

Martin-Weiss commented 7 months ago

Describe the solution you'd like: When deploying an OS with elemental - the node gets registered with Rancher / Elemental and during that process the elemental communication is validated with the CAs that are provided by the Rancher communication process as far as I understand.

Any further communication like the rancher-system-agent or rke2 downloading additional images needs its own / separate to add the same additional-ca stuff.

Unfortunately the elemental register process does not store the CA cert on the OS side of things so that it can be used by anything "later.

It would be really nice in case Elemental would add the CA that it gets from Rancher automatically to the OS trusted CA store (i.e. /etc/pki/trust/anchors and execute update-ca-certificates to update the cabundle of the OS)

Environment:

anmazzotti commented 7 months ago

Hello @Martin-Weiss, what would be the use case for this request? The elemental-register could inject CAs into the OS trust store, but this would also open some concerns about security.

Currently you can automate this by adding your own self-signed CA to Rancher:

Then you can simply trust it on each host by including it into the Registration's cloud-config:

Would that solve your issue?

Martin-Weiss commented 7 months ago

The "challenge" is that a customers CA Bundle can be really big and you do not want to add / manage this with each cloud-config separately as we have added it to the Rancher additional-ca, already.

Basically we would like to see tls-ca added to the OS CA bundle.

The OS is deployed and configured via Elemental using Rancher - so Elemental is also responsible for adding the trusted certificate authorities similar to SSH keys etc.

One of the challenges we face is i.e. RKE2 with a registry using self-signed SSL certificates. For this scenario we need to add CA cert to the the CA bundle of the OS or into each local certificate bundle. The OS CA bundle would be the best place..

anmazzotti commented 7 months ago

An alternative would be to create a custom image including the certificate to trust: https://elemental.docs.rancher.com/customizing/#create-a-custom-bootable-installation-media

(Note that this document has issues at the moment, since the elemental-builder-image is missing, we are fixing it asap)

Another way, maybe the easiest, is to use a certificate provider that is trusted by the base SLE Micro system. Most ACME providers that support a DNS challenge should work fine even in a private-Rancher environment.

Martin-Weiss commented 7 months ago

As we have to trust the additional-ca in Rancher, anyway - why going a separate route and manage things in multiple places?

Keep in mind that we have air-gapped and self-signed CA scenarios.. and unfortunately the most common CA customers use on-premise is the one from Microsoft that does not support ACME directly..

anmazzotti commented 7 months ago

I'm mainly worried about breaking the trust chain and leading to a very exploitable scenario where we add un-verifiable certs to the OS.

For example, what would you expect to happen if the Rancher CA changes? Would you expect the new CA to be trusted automatically by each host "immediately"? Or during elemental reset or only during elemental install?

Currently we do propagate Rancher CA to only a few services to trust (elemental-register, rancher-system-agent, and elemental-system-agent), and only at install phase, renewing it during reset.

Trusting the same certificate at an OS level will have a wide more impact.

Martin-Weiss commented 7 months ago

So we do not add it as trusted for RKE2 registries.yaml - if it would be in the OS - it would be fine for everything.. what is the value of adding a trusted CA to each services CA bundle? Does this give more security - especially in case the OS comes through the same channels?

Managing "what trusts which CA" is the basic problem we need to solve - and why doing it in so many different places instead of the central place the OS gives us since a long time?

-> we just need copy the "to be trusted CA certs" to the ..pki/anchors.. and run the OS specific update-ca-certificates script and everything "on top of the OS" automatically trust this one..

Do we really believe that having the CA trust separated improves security? Do we run things on an "untrusted OS"? (IMO there are many more security concerns in such a case and I guess running secure on an untrusted OS will not work in a secure way, anyway..)

From a higher level - bit out of scope for this issue - I believe we need some sort of general "config management" in elemental that allows us to change "small things" in a way that does not require the full re-deployment of each nodes OS.. (it would be required in case we would add the "to be trusted CAs" in the OS image or in cloud-config that is applied only during initial deployment)..

anmazzotti commented 7 months ago

Wouldn't Ansible be a good solution already for config management? The same CA could also be managed that way. Following a could native approach I would expect to reset the host and reprovision an immutable os, or update it to a new immutable image.

Martin-Weiss commented 7 months ago

Yes - integration into an existing systems-management / configuration management would be one way to do that.. but you need to sync the deployment with the systems-management (i.e. add SUSE Manager bootstrap during elemental node creation including removal / replacement of nodes...) and so far I am not sure if the elemental design is including some decisions for integration of config management solutions.

Just think about 10.000 nodes out there that you manage with elemental and then you want to add one additional trusted CA. Do you really want to rollout 10.000 OS replacements over weak WAN network connections? Or even with sometimes disconnected setups?

"cloud native" does not fit all requirements or is just to heavy / too expensive for a micro-change.

anmazzotti commented 7 months ago

Do you really want to rollout 10.000 OS replacements over weak WAN network connections?

Yes. Make sure the image can be downloaded first and checked for integrity, then reset or upgrade to it. Would be the safest and most predictable way of applying changes. I'd run OTA updates on my car that way, yes. Will take a while to download a 1GB "firmware", but it's not impossible. This can also be mitigated by having a simple registry proxy if a number of machines are within the same environment and there's a gateway to it.

In any other case I'd say Ansible would represent a better and battle tested solution. I don't think this in the scope of Elemental. Just a personal opinion here.

Martin-Weiss commented 7 months ago

Transporting a 2G Image to South Africa to 1000 devices over a 2 MBit line? ;-) just to add 10kbytes? What a waste and what a risk.. ;-)