rancherfederal / rke2-aws-tf

MIT License
85 stars 67 forks source link
aws kubernetes

rke2-aws-tf

rke2 is lightweight, easy to use, and has minimal dependencies. As such, there is a tremendous amount of flexibility for deployments that can be tailored to best suit you and your organization's needs.

This repository is inteded to clearly demonstrate one method of deploying rke2 in a highly available, resilient, scalable, and simple method on AWS. It is by no means the only supported solution for running rke2 on AWS.

We highly recommend you use the modules in this repository as stepping stones in solutions that meet the needs of your workflow and organization. If you have suggestions or areas of improvements, we would love to hear them!

Non-Backwards compatible changes

Changes have been introduced as of March 2023 that are not compatible with user-defined environments. Please make note of and test the following changes before deploying into your environments:

-rke2 user is no longer being installed by default for both servers and agents. -cloud-init runcmd scripts have been re-numbered as follows:

Usage

This repository contains 2 terraform modules intended for user consumption:

# Provision rke2 server(s) and controlplane loadbalancer
module "rke2" {
  source  = "git::https://github.com/rancherfederal/rke2-aws-tf.git"
  name    = "quickstart"
  vpc_id  = "vpc-###"
  subnets = ["subnet-###"]
  ami     = "ami-###"
}

# Provision Auto Scaling Group of agents to auto-join cluster
module "rke2_agents" {
  source  = "git::https://github.com/rancherfederal/rke2-aws-tf.git//modules/agent-nodepool"
  name    = "generic"
  vpc_id  = "vpc-###"
  subnets = ["subnet-###"]
  ami     = "ami-###"

  # Required input sourced from parent rke2 module, contains configuration that agents use to join existing cluster
  cluster_data = module.rke2.cluster_data
}

For more complete options, fully functioning examples are provided in the examples/ folder to meet the various use cases of rke2 clusters on AWS, ranging from:

Overview

The deployment model of this repository is designed to feel very similar to the major cloud providers kubernetes distributions.

It revolves around provisioning the following:

This iac leverages the ease of use of rke2 to provide a simple sshless bootstrapping process for sets of cluster nodes, known as nodepools. Both the servers and agents within the cluster are simply one or more Auto Scaling Groups (ASG) with the necessary [minimal userdata]() required for either creating or joining an rke2 cluster.

Upon ASG boot, every node will:

  1. Install the rke2 self-extracting binary from https://get.rke2.io
  2. Fetch the rke2 cluster token from a secure secrets store (s3)
  3. Initialize or join an rke2 cluster

The most basic deployment involves a server nodepool. However, most deployments will see a server nodepool with one or more logical groups of agent nodepools. These are typically separated based of node labels, workload functions, instance types, or any physical/logical separation of nodes.

Terraform Modules

This repository contains 2 primary modules that users are expected to consume:

rke2:

The primary rke2 cluster component. Defining this is mandatory, and will provision a control plane load balancer (AWS NLB) and a server nodepool.

agent-nodepool

Optional (but recommended) cluster component to create agent nodepools that will auto-join the cluster created using the rke2 module. This is the primary method for defining nodes in which cluster workloads will run.

Secrets

Since it is [bad practice]() to store sensitive information in userdata, s3 is used as a secure secrets store that is commonly available in all instantiations of AWS is used for storing and fetching the token. Provisioned nodes will fetch the token from the appropriate secrets store via the awscli before attempting to join a cluster.

IAM Policies

This module has a mininmum dependency on being able to fetch the cluster join token from an S3 bucket. By default, the bucket, token, roles, and minimum policies will be created for you. For restricted environments unable to create IAM Roles or Policies, you can specify an existing IAM Role that the instances will assume instead. Note that when going this route, you must be sure the IAM role specified has the minimum required policies to fetch the cluster token from S3. The required and optional policies are defined below:

Required Policies

Required policies are created by default, but are specified below if you are using a custom IAM role.

Get Token

Servers and agents need to be able to fetch the cluster join token

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "s3:GetObject",
            "Resource": "arn:<aws-region>:s3:<aws-region>:<aws-account>:<bucket>:<object>"
        }
    ]
}

Note: The S3 bucket will be dynamically created during cluster creation, in order to pre create an iam policy that points to this bucket, the use of wildcards is recommended. For example: s3:::us-gov-west-1:${var.cluster_name}-*

Get Autoscaling Instances

Servers need to be able to query instances within their autoscaling group for "leader election".

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Resource": "*",
            "Action": [
              "autoscaling:DescribeAutoScalingGroups",
              "autoscaling:DescribeAutoScalingInstances"
            ]
        }
    ]
}

Optional Policies

Optional policies have the option of being created by default, but are specified below if you are using a custom IAM role.

Requirements

Name Version
terraform >= 1.3
aws >= 4.6, <= 5.22
cloudinit >= 2
random >= 3

Providers

Name Version
aws >= 4.6, <= 5.22
cloudinit >= 2
random >= 3

Modules

Name Source Version
cp_lb ./modules/nlb n/a
iam ./modules/policies n/a
init ./modules/userdata n/a
servers ./modules/nodepool n/a
statestore ./modules/statestore n/a

Resources

Name Type
aws_iam_role_policy.aws_autoscaler resource
aws_iam_role_policy.aws_ccm resource
aws_iam_role_policy.aws_required resource
aws_iam_role_policy.get_token resource
aws_iam_role_policy.put_kubeconfig resource
aws_security_group.cluster resource
aws_security_group.server resource
aws_security_group_rule.cluster_egress resource
aws_security_group_rule.cluster_shared resource
aws_security_group_rule.server_cp resource
aws_security_group_rule.server_cp_supervisor resource
random_password.token resource
random_string.uid resource
aws_iam_policy_document.aws_autoscaler data source
aws_iam_policy_document.aws_ccm data source
aws_iam_policy_document.aws_required data source
aws_iam_role.provided data source
cloudinit_config.this data source

Inputs

Name Description Type Default Required
ami Server pool ami string n/a yes
associate_public_ip_address n/a bool null no
awscli_url URL for awscli zip file string "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" no
block_device_mappings Server pool block device mapping configuration map(string)
{
"encrypted": false,
"size": 30
}
no
ccm_external Set kubelet arg 'cloud-provider-name' value to 'external'. Requires manual install of CCM. bool false no
cluster_name Name of the rkegov cluster to create string n/a yes
controlplane_access_logs_bucket Bucket name for logging requests to control plane load balancer string "disabled" no
controlplane_allowed_cidrs Server pool security group allowed cidr ranges list(string)
[
"0.0.0.0/0"
]
no
controlplane_enable_cross_zone_load_balancing Toggle between controlplane cross zone load balancing bool true no
controlplane_internal Toggle between public or private control plane load balancer bool true no
create_acl Toggle creation of ACL for statestore bucket bool true no
download Toggle best effort download of rke2 dependencies (rke2 and aws cli), if disabled, dependencies are assumed to exist in $PATH bool true no
enable_autoscaler Toggle enabling policies required for cluster autoscaler to work bool false no
enable_ccm Toggle enabling the cluster as aws aware, this will ensure the appropriate IAM policies are present bool false no
extra_block_device_mappings Used to specify additional block device mapping configurations list(map(string)) [] no
extra_cloud_config_config extra config to append to cloud-config string "" no
extra_security_group_ids List of additional security group IDs list(string) [] no
iam_instance_profile Server pool IAM Instance Profile, created if left blank (default behavior) string "" no
iam_permissions_boundary If provided, the IAM role created for the servers will be created with this permissions boundary attached. string null no
instance_type Server pool instance type string "t3a.medium" no
lb_subnets List of subnet IDs to create load balancer in list(string) null no
metadata_options Instance Metadata Options map(any)
{
"http_endpoint": "enabled",
"http_put_response_hop_limit": 2,
"http_tokens": "required",
"instance_metadata_tags": "disabled"
}
no
post_userdata Custom userdata to run immediately after rke2 node attempts to join cluster string "" no
pre_userdata Custom userdata to run immediately before rke2 node attempts to join cluster, after required rke2, dependencies are installed string "" no
rke2_channel Channel to use for RKE2 server nodepool string null no
rke2_config Server pool additional configuration passed as rke2 config file, see https://docs.rke2.io/install/install_options/server_config for full list of options string "" no
rke2_install_script_url URL for RKE2 install script string "https://get.rke2.io" no
rke2_start Start/Stop value for the rke2-server/agent service. This will prevent the service from starting until the next reboot. True=start, False= don't start. bool true no
rke2_version Version to use for RKE2 server nodepool string null no
servers Number of servers to create number 3 no
spot Toggle spot requests for server pool bool false no
ssh_authorized_keys Server pool list of public keys to add as authorized ssh keys list(string) [] no
statestore_attach_deny_insecure_transport_policy Toggle for enabling s3 policy to reject non-SSL requests bool true no
subnets List of subnet IDs to create nodes in list(string) n/a yes
suspended_processes List of processes to suspend in the autoscaling service list(string) [] no
tags Map of tags to add to all resources created map(string) {} no
termination_policies List of policies to decide how the instances in the Auto Scaling Group should be terminated list(string)
[
"Default"
]
no
unique_suffix Enables/disables generation of a unique suffix to cluster name bool true no
unzip_rpm_url URL path to unzip rpm string "" no
vpc_id VPC ID to create resources in string n/a yes
wait_for_capacity_timeout How long Terraform should wait for ASG instances to be healthy before timing out. string "10m" no

Outputs

Name Description
cluster_data Map of cluster data required by agent pools for joining cluster, do not modify this
cluster_name Name of the rke2 cluster
cluster_sg Security group shared by cluster nodes, this is different than nodepool security groups
iam_instance_profile IAM instance profile attached to server nodes
iam_role IAM role of server nodes
iam_role_arn IAM role arn of server nodes
kubeconfig_path n/a
server_nodepool_arn n/a
server_nodepool_id n/a
server_nodepool_name n/a
server_sg n/a
server_url n/a