int128 / terraform-aws-nat-instance

Terraform module to provision a NAT Instance using an Auto Scaling Group and Spot Instance from $1/month
https://registry.terraform.io/modules/int128/nat-instance/aws/
Apache License 2.0
176 stars 90 forks source link

Support Amazon Linux 2023 #65

Open dwilkie opened 1 year ago

dwilkie commented 1 year ago

The current snat.sh script doesn't work on Amazon Linux 2023.

Here's my first attempt at an alternative script for Amazon Linux 2023

#!/bin/bash -x

# wait for ens6
while ! ip link show dev ens6; do
  sleep 1
done

# NAT Instance Setup
# https://docs.aws.amazon.com/vpc/latest/userguide/VPC_NAT_Instance.html#NATInstance

# enable IP forwarding and NAT on ens6
sysctl -q -w net.ipv4.ip_forward=1
sysctl -q -w net.ipv4.conf.ens6.send_redirects=0
/sbin/iptables -t nat -A POSTROUTING -o ens6 -j MASQUERADE
service iptables save

# switch the default route to ens6

GATEWAY=$(ip route | awk '/default/ { print $3 }')
ip route add $GATEWAY dev ens6
ip route add default via $GATEWAY
ip route del default dev ens5

# wait for network connection
curl --retry 10 http://www.example.com

# re-establish connections
systemctl restart amazon-ssm-agent

There's a couple of areas which could use improvement such as:

  1. Don't hardcode ens5 and ens6
  2. Persist the routes after a reboot

If there is anyone else interested in having this module work with Amazon Linux 2023 comment here and i'll submit a PR.

alexjeen commented 9 months ago

Hi,

You also need a slightly different runonce.sh:

#!/bin/bash -x

# attach the ENI
aws ec2 attach-network-interface \
  --region "$(/usr/bin/ec2-metadata -z  | sed 's/placement: \(.*\).$/\1/')" \
  --instance-id "$(/usr/bin/ec2-metadata -i | cut -d' ' -f2)" \
  --device-index 1 \
  --network-interface-id "${eni_id}"

# start SNAT
systemctl enable snat
systemctl start snat

And another aws_ami data call:

data "aws_ami" "this" {
  most_recent = true
  owners      = ["amazon"]
  filter {
    name   = "architecture"
    values = ["x86_64"]
  }
  filter {
    name   = "root-device-type"
    values = ["ebs"]
  }
  filter {
    name   = "name"
    values = ["al2023-ami-*"]
  }
  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }
}
alexjeen commented 8 months ago

This was a pretty painful yak shaving experience and I am not sure if it is correct. But here are my working changes for Amazon Linux 2023.

Change the ENI role to the following:

resource "aws_iam_role_policy" "eni" {
  role        = aws_iam_role.this.name
  name_prefix = "${var.prefix}-${var.environment}-nat-eni"
  policy      = <<EOF
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ec2:AttachNetworkInterface",
                "ec2:ModifyInstanceAttribute"
            ],
            "Resource": "*"
        }
    ]
}
EOF
}

Change the data block for the AMI to:

data "aws_ami" "this" {
  most_recent = true
  owners      = ["amazon"]
  filter {
    name   = "architecture"
    values = ["x86_64"]
  }
  filter {
    name   = "root-device-type"
    values = ["ebs"]
  }
  filter {
    name   = "name"
    values = ["al2023-ami-2023*"]
  }
  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }
}

In my outdated comment it picked the minimal AMI causing me to lose access with Systems Manager.

Change the runonce.sh to this:

#!/bin/bash -x

# Disable Source/Destination Check for the instance default interface
aws ec2 modify-instance-attribute --instance-id "$(/usr/bin/ec2-metadata -i | cut -d' ' -f2)" --no-source-dest-check

# attach the ENI
aws ec2 attach-network-interface \
  --region "$(/usr/bin/ec2-metadata -z  | sed 's/placement: \(.*\).$/\1/')" \
  --instance-id "$(/usr/bin/ec2-metadata -i | cut -d' ' -f2)" \
  --device-index 1 \
  --network-interface-id "${eni_id}"

# Install IP tables its not available by default on Amazon Linux 2023 anymore
sudo yum install -y iptables-services
sudo systemctl enable iptables
sudo systemctl start iptables

# start SNAT
systemctl enable snat
systemctl start snat

Change snat.sh to, I opted not to swap around the default interfaces but instead just redirect traffic with iptables:

#!/bin/bash -x

# wait for ens6
while ! ip link show dev ens6; do
  sleep 1
done

#  make this a nat instance
echo "net.ipv4.ip_forward = 1" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p

# Configure iptables to forward packets
sudo iptables -t nat -A POSTROUTING -o ens5 -j MASQUERADE
sudo iptables -A FORWARD -i ens6 -o ens5 -j ACCEPT
sudo iptables -A FORWARD -i ens5 -o ens6 -m state --state RELATED,ESTABLISHED -j ACCEPT
sudo iptables -D FORWARD -j REJECT --reject-with icmp-host-prohibited
sudo service iptables save

# ensure these settings persist across reboots
echo "@reboot root iptables-restore < /etc/sysconfig/iptables" | sudo tee -a /etc/crontab

# wait for network connection
curl --retry 10 http://www.google.com

# re-establish connections
systemctl restart amazon-ssm-agent

PS: I am not sure why it worked with Amazon Linux 2, but I think both of the ENI's need to have source/destination check set to false. This library only sets the source destination check to false on the new ENI (the one without the public IP). AFAIK there is no way to set it up for public IPs so it has to be done from the instances. Please correct me if I am wrong!

alexjeen commented 8 months ago

In addition, you might also want to change runonce.sh attaching of the ENI to this:

# try to reattach the ENI (if you reboot, sometimes the old instance does not release the ENI in time)
max_attempts=10
attempt=0

while true; do
    aws ec2 attach-network-interface \
        --region "$(/usr/bin/ec2-metadata -z | sed 's/placement: \(.*\).$/\1/')" \
        --instance-id "$(/usr/bin/ec2-metadata -i | cut -d' ' -f2)" \
        --device-index 1 \
        --network-interface-id "${eni_id}" && break

    attempt=$((attempt + 1))

    if [ "$attempt" -ge "$max_attempts" ]; then
        echo "Maximum attempts reached. Initiating reboot."
        sudo reboot
        break
    fi

    echo "Attempt $attempt failed. Retrying..."
    sleep 5 # waits for 5 seconds before retrying
done

Because sometimes, if a instance shutdown, and a new instance starts quickly, the ENI might have not been released from the previous instance. This causes the NAT instance to never work because it unable to attach the ENI.

webdevwilson commented 8 months ago

@alexjeen Thanks! That fixed my issues.