rancher / rke2

https://docs.rke2.io/
Apache License 2.0
1.51k stars 263 forks source link

Terraform Standalone Use Case: Hardened #4768

Open matttrach opened 12 months ago

matttrach commented 12 months ago

This tracks progress on satisfying a hardened RKE2 use case.

We will need to harden the OS

We will need to follow the hardening guide for RKE2: https://docs.rke2.io/security/hardening_guide

matttrach commented 12 months ago

The approach on this one will be to enable immutable infrastructure:

  1. Provision objects necessary to provision and configure server
  2. Provision server on AWS
  3. Harden server
  4. Install RKE2
  5. Harden RKE2
  6. Clean the install (remove anything which might be specific to the server)
  7. Generate an AMI from the server
    • The user can now move their new custom AMI to a secure region and deploy it in an air-gapped VPC
matttrach commented 11 months ago

focus on RHEL as first hardened OS

matttrach commented 11 months ago

The CIS Benchmarks appear to be the standard for how to achieve the hardened OS, CIS also provides custom AMIs on AWS that are pre-configured for their benchmarks. The STIG benchmark for RHEL is the one which we should use for servers. There is also a distribution independent benchmark that we might use for other server types, it contains multiple levels of suggestions, look for the "server - level 2" suggestions.

matttrach commented 11 months ago

To harden RKE2 on Rhel8 we should be able to get by with setting the cis config as follows along with adding a user for etcd and setting the profile flag in the config.

small script to enable cis conf:

sudo cp -f /usr/share/rke2/rke2-cis-sysctl.conf /etc/sysctl.d/60-rke2-cis.conf && \
sudo systemctl restart systemd-sysctl && \ 
sudo useradd -r -c "etcd user" -s /sbin/nologin -M etcd -U

example cis profile enabled rke2 config:

write-kubeconfig-mode: 644
cni: calico
cloud-provider-name: "aws"
profile: "cis-1.23"
selinux: true

This requires enabling an extra config on top of what is necessary for clustering, adding the ability to inject a script to prep the OS for running rke2 after install, but before first start.

matttrach commented 11 months ago

Enabling the RHEL8 STIG AMI: https://github.com/rancher/terraform-aws-server/pull/20

matttrach commented 11 months ago

The changes there will need to be propagated to the install and rke2 modules and their examples. Then we should be able to inject a script to install the selinux policies before starting rke2.

matttrach commented 11 months ago

Propagate CIS to install module with example cis configuration: https://github.com/rancher/terraform-null-rke2-install/pull/51

matttrach commented 11 months ago

I am currently working on adding a local repo to the server to enable air-gapped rpm installs with selinux enforcing on the CIS AMI.

matttrach commented 10 months ago

Status

matttrach commented 10 months ago

the latest changes to aws-rke2 module include:

Next up:

matttrach commented 6 months ago

Prioritizing by difficulty/time consumption:

matttrach commented 6 months ago
matttrach commented 6 months ago

These are not small items unfortunately, it will take me some time to get these things figured out.

In the mean time here is a repo showing how to get everything else running: https://github.com/rancher/terraform-aws-rke2-live-example

This has a full IAC of an RKE2 node with an airgapped server that you can only access via the AWS serial console. It deploys a "prototype" server which has access to download the things it needs before shutting down and getting turned into an image. The production server is then deployed using that image and an updated config to set the proper ip addresses and join token. The repo is set up to be fully IAC meaning that users manage their infrastructure like code artifacts in a repo, it has CI to test and automatically deploy infrastructure. Secrets are encrypted and the encryption is automatically rotated weekly. Each user has their own key to decrypt the secrets, and one exists for the CI that is not viewable without a code change.

matttrach commented 6 months ago

State is stored encrypted in the repo, as well as all of the access necessary for the CI to deploy. The CI is the public github runner and is completely free (3k min for a private repo, but unlimited for public, in my experience it is pretty hard to reach that 3k min using just one repo). Users don't need in-depth (or any) knowledge of terraform to use the example, but maintainers will need to understand what they are looking at to make educated changes.

matttrach commented 6 months ago

CI access is created before every run and destroyed at the end making it very limited. CI never has access to production servers (they don't have public IP addresses).

matttrach commented 6 months ago

I am going to move this issue to our backlog as I don't have a clear timeline.

matttrach commented 6 months ago

This now aligns with https://github.com/rancher/rke2/issues/5541. I will make sure to update both so everyone is on the same page, but it will have the most up to date information. I expect to implement items there into the example repo and I will add a summary here when I do.

matttrach commented 2 months ago

Dualstack and SLE micro are being propagated through the system, next challenge is the embedded registry.