terraform-community-modules / tf_aws_vpc

[DEPRECATED] Use https://github.com/terraform-aws-modules/terraform-aws-vpc
Other
211 stars 203 forks source link

nat gateway for multiple private subnets #42

Open babatundebusari opened 7 years ago

babatundebusari commented 7 years ago

as we know NAT gateways are not cheap any way to have an option to use one NAT gateway for multiple private subnets for routing? will be nice to have in this module

think having like a hundred private subnets..now you get the point

thanks

zot24 commented 7 years ago

However if you do that you will be losing high availability in your system if e.g. you have 3 private subnets each of them in a different availability zone you NAT will be on a public subnet in one of those availability zones which means if that specific availability zone goes down you whole system goes down.

I get your point tho I guess a solution could be change enable_nat_gateway for nat_gateway_count and then control the number of NATs that you want to create and if needed make it 1 or the number of public subnets but the user will need to be aware of:

It's generally preferable to keep public_subnets, private_subnets, and azs to lists of the same length.

This module optionally creates NAT Gateways in each public subnet and sets them as the default gateways for the corresponding private subnets.

nap commented 7 years ago

Speaking of HA, consider the following

azs                 = "us-east-1a,us-east-1b"
public_subnets      = "10.0.10.0/24,10.0.20.0/24"
# Note alternating 3rd byte. 10s are in azs[0] and 20s are in azs[1]
# List index wrap, so 1st and 3rd item will be in azs[0]
private_subnets     = "10.0.12.0/24,10.0.22.0/24,10.0.14.0/24,10.0.24.0/24"

It will generate one public and two private subnet in each availability zone. But it also generate four nat gateway, which is wrong.

The AWS documentation is clear about this, nat gateway are bound to public subnet. Even though you have 4 private subnet, you should still have only two nat gateway if you have two public subnet for two availability zone.

To create a NAT gateway, you must specify the public subnet in which the NAT gateway will reside. 
[...] After you've created a NAT gateway, you must update the route table associated with one or 
more of your private subnets to point Internet-bound traffic to the NAT gateway. This enables 
instances in your private subnets to communicate with the Internet.

SEE: http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-nat-gateway.html#nat-gateway-basics

Considering that, if multiple private subnet are in the same AZ they can have the same route table (RT<-(1..*)-SN bound). That route table can point 0.0.0.0/0 traffic to the NAT Gateway instance in the Public Subnet of that Availability Zone. This means HA with multiple private SN one public SN and one NGW. I dont see the requirement of having multiple NGW per AZ. You really only need one NGW per AZ iff that AZ has a public SN.

nap commented 7 years ago

The code associated with this issue is pretty obvious

count = "${length(var.private_subnets) * lookup(map(var.enable_nat_gateway, 1), "true", 0)}"

But it should be

count = "${length(var.public_subnets) * lookup(map(var.enable_nat_gateway, 1), "true", 0)}"

Same issue with EIP that are generated for each NAT Gateway.

dyindude commented 7 years ago

To me it makes more sense to set the number of NATGWs to the number of AZs that you are using. I have trouble thinking of a use case where you would need to have more than one NATGW per AZ.

With the current code, I worked around this by setting enable_nat_gateway = "false" and attaching routes to my own aws_nat_gateway resources:

module "vpc" {
  source = "github.com/terraform-community-modules/tf_aws_vpc"
  name = "${var.project_name}-vpc"
  cidr = "${var.vpc_cidr}"
  enable_dns_support = "true"
  private_subnets = ["${cidrsubnet(var.vpc_cidr,6,0)}",
                     "${cidrsubnet(var.vpc_cidr,6,4)}",
                     "${cidrsubnet(var.vpc_cidr,6,8)}",
                     "${cidrsubnet(var.vpc_cidr,6,1)}",
                     "${cidrsubnet(var.vpc_cidr,6,5)}",
                     "${cidrsubnet(var.vpc_cidr,6,9)}",
                     "${cidrsubnet(var.vpc_cidr,6,2)}",
                     "${cidrsubnet(var.vpc_cidr,6,6)}",
                     "${cidrsubnet(var.vpc_cidr,6,10)}"]

  public_subnets = ["${cidrsubnet(var.vpc_cidr,6,3)}",
                    "${cidrsubnet(var.vpc_cidr,6,7)}",
                    "${cidrsubnet(var.vpc_cidr,6,11)}"]
  enable_nat_gateway = "false"
  azs = ["${var.aws_region}a", "${var.aws_region}b", "${var.aws_region}c"]
}
//the community module ends up creating 9 nat gateways, one per private subnet
//this will create 3, one per public subnet, then allocate routes to the appropriate subnets
resource "aws_eip" "gw" {
  count = "3"
}
resource "aws_nat_gateway" "gw" {
  count = "3"
  allocation_id = "${element(aws_eip.gw.*.id,count.index)}"
  subnet_id = "${element(module.vpc.public_subnets,count.index)}"
}
resource "aws_route" "gw" {
  count = "12"
  route_table_id = "${element(module.vpc.private_route_table_ids,count.index)}"
  destination_cidr_block = "0.0.0.0/0"
  nat_gateway_id = "${element(aws_nat_gateway.gw.*.id,count.index%3)}"
}

In PR #44 the NAT gateways are determined by number of AZs and the routes are determined using logic similar to this workaround.

nap commented 7 years ago

You never need more than one NGW per AZ if you setup your routes properly (Each Private SN, if required, should route 0.0.0.0/0 to the NGW). But, you don't need a NGW if you don't have a Public SN.

fabriziomoscon commented 7 years ago

I second @babatundebusari concerns about price, and also second @zot24 HA remark. I found to this module when I started setup my own VPC with terraform and I greatly appreciate the community effort that went into it; putting together a very knowledgable and efficient configuration.

However, it will not hurt to specify in the readme that the number of Nat gateway IS GOING TO highly impact your final bill on AWS compared to the rest of resources used in this module which are next to free.

I understand this issue is addressing 2 things:

My case is slightly different, but I think common to many people, especially the ones starting with small infrastructure requirements. Despite the small scale of my project I would like to have multiple AZ (2 or 3) to provide a level of fault tolerance, but as of today and considering the scale of the project I can't justify $35 a month per AZ ($70, 105 respectively), especially when it represents by far the biggest billed item.

I would like to hear some expert advice about considering a cheaper alternative to use NAT instances (t2.micro or even t2nano) as a cheap alternative to NAT gateway. I would be great to hear pros and cons, so I can better understand. Any answer would be greatly appreciated.

I found that this module helps you creating a NAT instance https://github.com/terraform-community-modules/tf_aws_nat

zot24 commented 7 years ago

If price if your concern I wouldn't bother about a NAT I'll secure my instances with SGs and NACLs on a public subnet.

If someone find it useful we have create a stack of terraform modules moltin/terraform-stack and individual modules moltin/terraform-modules however it might not be helpful for you @fabriziomoscon as it does create a NAT per private subnet and therefore a public subnet per private one but people could find at least good examples there

fabriziomoscon commented 7 years ago

Ok, I think you wanted to write that those repos are NOT useful to me, I had a wonder inside anyway :) good stuff. I have put SGs and NACLs on my private subnets. I am trying with the NAT instances way, if that doesn't work I will fallback on your suggestion.

nap commented 7 years ago

The NAT Gateway that AWS provide you with is an instance with multihoming (or IP Aliasing) and Port Forwarding configured. You can do the same with a regular AWS instance (a while ago, NAT Gateway did not exist). The size of the instance you choose depends on the traffic your instance will have to manage. If that instance is too small, it will result in a network throughput bottleneck.

NAT is often perceived (wrongfully) as a security construct. But, having pessimistic NACL and SGs will provide you with more security than a NAT with lax NACL and SGs.

dyindude commented 7 years ago

@fabriziomoscon Since this module provides a private_route_table_ids output, you could deploy a VPC with this module with enable_nat_gateway = "false" and then add a route to each of module.vpc.private_route_table_ids targeting the aws_instance you have created

resource "aws_route" "my-nat-instance" {
  route_table_id         = "${element(module.vpc.private_route_table_ids,count.index)}"
  count                  = "${length(module.vpc.private_route_table_ids)}"
  destination_cidr_block = "0.0.0.0/0"
  instance_id            = "${aws_instance.my-nat-instance.id}"
}

I haven't tested the above, but it should give you an idea - you'd still need to make sure your aws_instance resource has source/destination check disabled, launches a NAT instance AMI, etc.

fabriziomoscon commented 7 years ago

@dyindude thanks! I had setup 2 NAT instances in my public subnets given this configuration:

resource "aws_route" "my_nat_instance" {
    count                  = "${length(module.vpc.private_route_table_ids)}"
    route_table_id         = "${element(module.vpc.private_route_table_ids, count.index)}"
    destination_cidr_block = "0.0.0.0/0"
    instance_id            = "${element(aws_instance.nat.*.id, count.index)}"
}

resource "aws_security_group" "nat_instance_sg" {
    description = "Nat instance security group"
    vpc_id      = "${module.vpc.vpc_id}"
    name        = "nat-sg"

    # opens inbound traffic from the Internet for http and https
    ingress {
        protocol    = "tcp"
        from_port   = 80
        to_port     = 80
        cidr_blocks = ["0.0.0.0/0"]
    }

    ingress {
        protocol    = "tcp"
        from_port   = 443
        to_port     = 443
        cidr_blocks = ["0.0.0.0/0"]
    }

    ingress {
        protocol  = "tcp"
        from_port = 22
        to_port   = 22
        security_groups = [
            "${aws_security_group.bastion_sg.id}",
        ]
    }

    # opens outbound traffic from the Internet for http and https
    egress {
        protocol    = "tcp"
        from_port   = 80
        to_port     = 80
        cidr_blocks = ["0.0.0.0/0"]
    }

    egress {
        protocol    = "tcp"
        from_port   = 443
        to_port     = 443
        cidr_blocks = ["0.0.0.0/0"]
    }

    tags {
        Name        = "sg-nat"
        Terraform   = "true"
        Environment = "${var.environment}"
    }
}

resource "aws_instance" "nat" {
    count             = "${length(var.aws_azs)}"
    ami               = "ami-5bc6c23d" # amzn-ami-vpc-nat-hvm-2017.03.0.20170417-x86_64-ebs
    instance_type     = "t2.micro"
    source_dest_check = false
    key_name          = "${var.key_name}"
    subnet_id         = "${element(module.vpc.public_subnets, count.index)}"
    security_groups   = ["${aws_security_group.nat_instance_sg.id}"]
    monitoring        = true
    tags {
        Name          = "${format("nat-%d", count.index+1)}"
        Terraform   = "true"
        Environment = "${var.environment}"
    }

    lifecycle {
        create_before_destroy = true
    }
}

resource "aws_eip" "nateip" {
    instance   = "${element(aws_instance.nat.*.id, count.index)}"
    vpc        = true
    count      = "${length(var.aws_azs)}"
    depends_on = ["aws_instance.nat"]
}

And it works!

aprilmintacpineda commented 1 year ago

I have a github repo that I'm using for my terraform studies, in this repo I am creating 1 NAT gateway for each public subnets in each availability zones. In this case I only have 1 public subnet, 1 private subnet, 2 availability zone for a total of 2 NAT gateways. The part that will catch your attention is when you look at the billing the next day, NAT gateways literally consumed more than the ECS containers.

My Nat consumed 3.85 USD but by ECS consumed only 0.10 USD. I mean, what gives?

Description
Usage Quantity
Amount in USD

Elastic Compute Cloud
USD 3.85

Asia Pacific (Singapore)
USD 3.85

Amazon Elastic Compute Cloud NatGateway
USD 3.82
$0.059 per GB Data Processed by NAT Gateways
5.775 GB    USD 0.34
$0.059 per NAT Gateway Hour
59 Hrs  USD 3.48

Amazon Elastic Compute Cloud running Linux/UNIX
USD 0.03
$0.0146 per On Demand Linux t2.micro Instance Hour
2.306 Hrs   USD 0.03

EBS
USD 0.00
$0.096 per GB-month of General Purpose (gp3) provisioned storage - Asia Pacific (Singapore)
0.027 GB-Mo USD 0.00
$0.12 per GB-month of General Purpose SSD (gp2) provisioned storage - Asia Pacific (Singapore)
0 GB-Mo USD 0.00

Elastic IP Addresses
USD 0.00
$0.00 per Elastic IP address not attached to a running instance for the first hour
0.8 Hrs USD 0.00
Elastic Container Service
USD 0.10

Asia Pacific (Singapore)
USD 0.10

Amazon Elastic Container Service APS1-Fargate-GB-Hours
USD 0.02
AWS Fargate - Memory - Asia Pacific (Singapore)
3.164 hours USD 0.02

Amazon Elastic Container Service APS1-Fargate-vCPU-Hours:perCPU
USD 0.08
AWS Fargate - vCPU - Asia Pacific (Singapore)
1.582 hours USD 0.08
nap commented 1 year ago

@aprilmintacpineda given your use case, you'd probably be better off with a NAT Instance instead of a NAT Gateway. You could select a another cheap t2.micro for your NAT. Just make sure not to use that setup if you'll be running any important network traffic through it.

TL;DR: NAT Instances allow you to select the instance type you want, because NATing can be achieved with a few kernel and network configuration on a plain old linux box.

aprilmintacpineda commented 1 year ago

@nap thanks for the tip!