Open DaMandal0rian opened 3 months ago
PR Description updated to latest commit (https://github.com/subspace/infra/commit/2656d95a4cf69b300e27d9326e77f80b78f19ac0)
⏱️ Estimated effort to review [1-5] | 5, due to the extensive number of files and changes involved in this PR, including multiple Terraform modules, Packer configurations, and Lambda function implementations. The complexity and interdependencies between these components significantly increase the review effort required to ensure correctness, security, and adherence to best practices. |
🧪 Relevant tests | No |
🔍 Possible issues | Possible Security Concern: The PR introduces numerous AWS resources and Lambda functions. Without proper IAM policies and security group configurations, there might be potential security risks, such as overly permissive access or unintended exposure of resources. |
Possible Performance Issue: The use of ephemeral runners and on-demand scaling based on GitHub events can lead to unpredictable costs and performance implications under high load or misconfiguration. | |
Dependency Management: The PR includes external dependencies (e.g., GitHub Actions runner binaries, Docker, AWS CLI). Changes in these dependencies could impact the stability and functionality of the setup. | |
🔒 Security concerns | - IAM Policies: Ensure that IAM roles and policies assigned to Lambda functions and EC2 instances follow the principle of least privilege to minimize security risks. - Public Exposure: Verify that security groups and network access controls are correctly configured to prevent unauthorized access to resources. |
relevant file | github-runners/terraform/autoscaling/modules/runners/main.tf |
suggestion | Consider adding lifecycle hooks to the `aws_launch_template` resource to manage updates and replacements more gracefully. This can help in ensuring that instances are only replaced according to the defined policies, minimizing disruptions. [important] |
relevant line | resource "aws_launch_template" "runner" { |
relevant file | github-runners/terraform/autoscaling/modules/runner-binaries-syncer/runner-binaries-syncer.tf |
suggestion | Implement error handling and retry mechanisms in the Lambda function responsible for syncing runner binaries. This will improve the robustness of the setup by handling transient errors that may occur during the download or update process. [important] |
relevant line | resource "aws_lambda_function" "syncer" { |
relevant file | github-runners/terraform/autoscaling/gh-runners/images/ubuntu-jammy/github_agent.ubuntu.pkr.hcl |
suggestion | Optimize the Packer build by using a more specific source AMI filter to reduce build times and potential costs. Narrowing down the search to a more recent base AMI can also ensure that the runners are built on a more secure and updated OS. [medium] |
relevant line | source_ami_filter { |
relevant file | github-runners/terraform/autoscaling/modules/runners/pool/main.tf |
suggestion | Add monitoring and alerting for the Lambda function that adjusts the runner pool size. Monitoring metrics such as invocation errors, duration, and throttles can help in identifying issues early and maintaining the desired state of runner pools. [medium] |
relevant line | resource "aws_lambda_function" "pool" { |
Category | Suggestions |
Enhancement |
Use dynamic blocks for tags to improve code maintainability.___ **Consider using a dynamic block fortags to improve maintainability and readability. This approach allows you to easily add or remove tags without modifying the structure of each resource definition.** [github-runners/terraform/autoscaling/gh-runners/ephemeral/main.tf [25-27]](https://github.com/subspace/infra/pull/302/files#diff-8d5361c04f1c28f11368d39d585dd8e835295a11af3965041d9ed5de301a4319R25-R27) ```diff -tags = { - Project = "subspace-scale-runners" +dynamic "tags" { + for_each = { + Project = "subspace-scale-runners" + # Add more tags here + } + content { + key = tags.key + value = tags.value + } } ``` |
Change
___
**For the | |
Use multiple subnets for Lambda functions for higher availability.___ **To ensure high availability and fault tolerance, consider specifying multiple subnet IDsfor the Lambda functions by using a list of subnets rather than a single subnet. This allows the Lambda functions to be deployed across multiple Availability Zones.** [github-runners/terraform/autoscaling/main.tf [245]](https://github.com/subspace/infra/pull/302/files#diff-69de1c5f1c6562547995b664c547cd9ebd9460c2ad1c756cb4c2f9482bdf73feR245-R245) ```diff -lambda_subnet_ids = var.lambda_subnet_ids +lambda_subnet_ids = var.lambda_subnet_ids_list ``` | |
Add a condition to ensure
___
**To ensure that the lambda function only uses the S3 bucket when necessary, consider adding | |
Best practice |
Parameterize instance types for flexibility.___ **To ensure the scalability and flexibility of your Terraform configuration, considerparameterizing the instance_types to allow different instance types for different environments or use cases.** [github-runners/terraform/autoscaling/gh-runners/ephemeral/main.tf [49]](https://github.com/subspace/infra/pull/302/files#diff-8d5361c04f1c28f11368d39d585dd8e835295a11af3965041d9ed5de301a4319R49-R49) ```diff -instance_types = ["m6a.4xlarge", "c6a.4xlarge"] +instance_types = var.instance_types ``` |
Avoid hardcoding AWS region for security and flexibility.___ **To avoid potential security risks, consider removing the hardcoded AWS region and insteaduse a variable. This will also increase the flexibility of your Terraform configuration.** [github-runners/terraform/autoscaling/gh-runners/ephemeral/main.tf [3]](https://github.com/subspace/infra/pull/302/files#diff-8d5361c04f1c28f11368d39d585dd8e835295a11af3965041d9ed5de301a4319R3-R3) ```diff -aws_region = "us-west-2" +aws_region = var.aws_region ``` | |
Use a more specific version constraint for the
___
**It's recommended to use a more specific version constraint for the | |
Use the null coalescing operator for default values to improve code readability.___ **For better maintainability and readability, consider using Terraform'snull coalescing operator instead of ternary operators for default values. This operator simplifies the expression and makes the code cleaner.** [github-runners/terraform/autoscaling/modules/ami-housekeeper/main.tf [2-3]](https://github.com/subspace/infra/pull/302/files#diff-1a780d161895dc12d02dbf91b2df5df854fb2ffe2a469dfb7b8d4933313cbeeeR2-R3) ```diff -lambda_zip = var.lambda_zip == null ? "${path.module}/../../lambdas/functions/ami-housekeeper/ami-housekeeper.zip" : var.lambda_zip -role_path = var.role_path == null ? "/${var.prefix}/" : var.role_path +lambda_zip = var.lambda_zip ?? "${path.module}/../../lambdas/functions/ami-housekeeper/ami-housekeeper.zip" +role_path = var.role_path ?? "/${var.prefix}/" ``` | |
Maintainability |
Use a local variable for
___
**For better maintainability and to avoid duplication, consider using a local variable or a |
Move variable definitions to a separate file for better maintainability.___ **To improve the maintainability and readability of the code, consider using a separatevariables file for defining variables. This helps in separating the configuration from the template logic and makes the template cleaner.** [github-runners/terraform/autoscaling/gh-runners/images/ubuntu-jammy-arm64/github_agent.ubuntu.pkr.hcl [10-12]](https://github.com/subspace/infra/pull/302/files#diff-36f4c64d06ada4a3d8018594aa00af8ddb264389da8c726d97dcfdd31d54be87R10-R12) ```diff -variable "runner_version" { - description = "The version (no v prefix) of the runner software to install https://github.com/actions/runner/releases. The latest release will be fetched from GitHub if not provided." - default = null -} +# Variables should be defined in a separate file, e.g., variables.tf ``` | |
Use a variable for Terraform required version.___ **To improve maintainability and avoid hardcoding, consider using a Terraform variable forthe required_version of Terraform in versions.tf . This allows for easier updates and version management.** [github-runners/terraform/autoscaling/main.tf [2]](https://github.com/subspace/infra/pull/302/files#diff-69de1c5f1c6562547995b664c547cd9ebd9460c2ad1c756cb4c2f9482bdf73feR2-R2) ```diff -required_version = ">= 1" +required_version = var.terraform_required_version ``` | |
Use a separate file for IAM policies and roles to improve maintainability.___ **To improve the maintainability of your Terraform code, consider using a separate file forIAM policies and roles. This approach makes it easier to manage and review IAM policies, especially as they grow in complexity.** [github-runners/terraform/autoscaling/modules/ami-housekeeper/main.tf [57-63]](https://github.com/subspace/infra/pull/302/files#diff-1a780d161895dc12d02dbf91b2df5df854fb2ffe2a469dfb7b8d4933313cbeeeR57-R63) ```diff +# In a separate file, e.g., iam.tf resource "aws_iam_role" "ami_housekeeper" { name = "${var.prefix}-ami-housekeeper-role" assume_role_policy = data.aws_iam_policy_document.lambda_assume_role_policy.json path = local.role_path permissions_boundary = var.role_permissions_boundary tags = var.tags } ``` | |
Security |
Review security practices when using public IP addresses.___ **For better security practices, avoid using public IP addresses if not necessary. Ifassociate_public_ip_address is set to true, ensure that the instances are secured and only necessary ports are exposed.** [github-runners/terraform/autoscaling/gh-runners/images/ubuntu-jammy-arm64/github_agent.ubuntu.pkr.hcl [104]](https://github.com/subspace/infra/pull/302/files#diff-36f4c64d06ada4a3d8018594aa00af8ddb264389da8c726d97dcfdd31d54be87R104-R104) ```diff -associate_public_ip_address = var.associate_public_ip_address +# Ensure security measures are in place if using public IP addresses ``` |
Dynamically fetch AMI names using AWS SSM Parameter Store for enhanced security.___ **To ensure the EC2 instances are using the latest and most secure AMIs, considerdynamically fetching the source_ami_filter name using AWS SSM Parameter Store instead of hardcoding the AMI name pattern.** [github-runners/terraform/autoscaling/gh-runners/images/ubuntu-jammy-arm64/github_agent.ubuntu.pkr.hcl [107-115]](https://github.com/subspace/infra/pull/302/files#diff-36f4c64d06ada4a3d8018594aa00af8ddb264389da8c726d97dcfdd31d54be87R107-R115) ```diff source_ami_filter { filters = { - name = "*ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-arm64-server-*" + name = "{{ data.aws_ssm_parameter.ubuntu_ami.value }}" root-device-type = "ebs" virtualization-type = "hvm" } most_recent = true owners = ["099720109477"] } ``` | |
Restrict SQS queue policy actions to only necessary ones.___ **Consider using a more restrictive condition for the SQS queue policy to limit the actionsto only what's necessary, rather than allowing all SQS actions ( "sqs:*" ). This follows the principle of least privilege, enhancing the security of your Terraform configuration.** [github-runners/terraform/autoscaling/main.tf [33-35]](https://github.com/subspace/infra/pull/302/files#diff-69de1c5f1c6562547995b664c547cd9ebd9460c2ad1c756cb4c2f9482bdf73feR33-R35) ```diff actions = [ - "sqs:*" + "sqs:SendMessage", + "sqs:ReceiveMessage", + "sqs:DeleteMessage", + "sqs:GetQueueAttributes" ] ``` | |
Enable server-side encryption on SQS queues by default.___ **For better security, consider enabling server-side encryption on the SQS queues bydefault, unless there's a specific reason to disable it. This can be done by setting sqs_managed_sse_enabled to true in the SQS resource configurations.**
[github-runners/terraform/autoscaling/main.tf [73]](https://github.com/subspace/infra/pull/302/files#diff-69de1c5f1c6562547995b664c547cd9ebd9460c2ad1c756cb4c2f9482bdf73feR73-R73)
```diff
-sqs_managed_sse_enabled = var.queue_encryption.sqs_managed_sse_enabled
+sqs_managed_sse_enabled = true
```
| |
Specify specific ARNs in IAM policy resources instead of using a wildcard.___ **It's recommended to avoid using a broad wildcard ("*" ) for IAM policy resources when possible. Specify the ARNs of the resources that the policy should apply to, enhancing the security by following the principle of least privilege.** [github-runners/terraform/autoscaling/main.tf [38-39]](https://github.com/subspace/infra/pull/302/files#diff-69de1c5f1c6562547995b664c547cd9ebd9460c2ad1c756cb4c2f9482bdf73feR38-R39) ```diff resources = [ - "*" + "arn:aws:sqs:region:account-id:queue-name" ] ``` | |
Specify a more restrictive
___
**To enhance security, consider specifying a more restrictive |
User description
Terraform module Self-Hosted Scalable GitHub Actions runners on AWS based on the official Philips AWS Module. https://github.com/philips-labs/terraform-aws-github-runner under MIT License.
This PR removes some features and modules not required for our infra use case and makes use of more granular modules.
Scale up and down based on GitHub events Scale down to zero when no jobs are running Runners are created on-demand and terminated after use (ephemeral runners) Runners are created on spot instances use custom AMI, define the instance types and subnets to use. OS support: Linux (x64/arm64) and Windows CI workflow for deployment
Type
enhancement
Description
Changes walkthrough
variables.tf
Add Variables for Configuring AWS, GitHub App, Runners, and Lambdas
github-runners/terraform/autoscaling/variables.tf
VPC, subnets, tags, GitHub app parameters, runner settings, and more.
enabling ephemeral runners, job queued check, and managed runner
security group.
size, timeout, and S3 bucket details for lambda functions.
OS support.
main.tf
Configure AWS Resources, Runners, and Lambda Functions
github-runners/terraform/autoscaling/main.tf
module, and runners module.
main.tf
Setup AWS Launch Template and Security Groups for Runners
github-runners/terraform/autoscaling/modules/runners/main.tf
instance type, security groups, and user data.
variables.tf
Add Variables for Runner Binaries Syncer Lambda Configuration
github-runners/terraform/autoscaling/modules/runner-binaries-syncer/variables.tf
function.
details, and logging.
runner-binaries-syncer.tf
Configure Runner Binaries Syncer Lambda and Trigger
github-runners/terraform/autoscaling/modules/runner-binaries-syncer/runner-binaries-syncer.tf
policies, and cloudwatch event rule.
deployment.
main.tf
Setup Lambda Function for Runner Pool Management
github-runners/terraform/autoscaling/modules/runners/pool/main.tf
on schedule.
variables.tf
Add Variables for Termination Watcher Lambda Configuration
github-runners/terraform/autoscaling/modules/termination-watcher/variables.tf
function.
and environment variables.