hardfinhq / terraform-aws-tailscale-subnet-router

Terraform module for Tailscale subnet router in AWS ECS Fargate
https://registry.terraform.io/modules/hardfinhq/tailscale-subnet-router/aws
Apache License 2.0
30 stars 9 forks source link
aws ecs fargate tailscale terraform

Terraform module for Tailscale subnet router in ECS Fargate

This module deploys a Tailscale subnet router as an AWS Fargate ECS task. The subnet router runs within an AWS VPC and advertises (to the Tailnet) the entire CIDR block for that VPC.

Docker Container

The _docker/tailscale.Dockerfile file extends the tailscale/tailscale image with an entrypoint script that starts the Tailscale daemon and runs tailscale up using an auth key and the relevant advertised CIDR block.

This Docker container must be built and pushed to an ECR repository.

docker build \
  --tag tailscale-subnet-router:v1.20230311.1 \
  --file ./_docker/tailscale.Dockerfile \
  .

# Optionally override the tag for the base `tailscale/tailscale` image
docker build \
  --build-arg TAILSCALE_TAG=v1.38.4 \
  --tag tailscale-subnet-router:v1.20230311.1 \
  --file ./_docker/tailscale.Dockerfile \
  .

Operator's Notes

Room for Improvement

Throughput

Right now this explicitly maps exactly one subnet router per VPC. As an organization grows, this can cause the subnet router to get saturated and cause a bottleneck. One of the perks of a mesh VPN is that bottlenecks via a centralized controller aren't possible, so reintroducing a bottleneck is unfortunate.

The best way to avoid this bottleneck is to not use a subnet router at all, but many engineering organizations can't (or don't want to) run Tailscale as a sidecar for all workloads. Assuming a subnet router will be used, there are a few ways bottlenecks can be mitigated:

State

In the current form, this module uses AWS EFS to persist the Tailscale state in /var/lib/tailscale across deploys.

tailscaled --state arn:aws:ssm:zz-minotaur-7:123456789012:parameter/sandbox-tailscale

VPC

This module assumes a VPC Name is used, equivalent to:

data "aws_vpc" "sandbox" {
  tags = {
    Name = "sandbox"
  }
}

We'd be open to accepting a vpc_id directly.

Subnet group

The subnet_group variable is of note; it is used to filter subnets tagged with group={subnet_group}. This is a convention we use at Hardfin to group together subnets that are part of the same VPC (usually one subnet per AZ). In Terraform, this is determined via:

data "aws_subnets" "primary" {
  filter {
    name   = "vpc-id"
    values = ["vpc-51edfd86d3223cdff"]
  }
  tags = {
    group = "sandbox-igw-zz-minotaur-7"
  }
}

We'd be open to accepting an aws_subnet_ids list directly.