remind101 / stacker_blueprints

DEPRECATED - moved to:
https://github.com/cloudtools/stacker_blueprints
BSD 2-Clause "Simplified" License
39 stars 53 forks source link

Add support for using NAT Gateway over NAT instances #10

Closed mwildehahn closed 8 years ago

mwildehahn commented 8 years ago

This will only use the new NAT Gateway if UseNatGateway is provided.

One thing I'm wondering is if we should add support for creating Bastion hosts if the NAT Gateway is used. Thoughts?

phobologic commented 8 years ago

This should fix #8. @cyommer - want to take a look since you've played with this a bit? Want to make sure this is a safe upgrade for people.

mwildehahn commented 8 years ago

given this is an explicit option you have to pass, can we merge this?

we can add a comment in the README about the redshift issue, but i don't think that should impact whether or not stacker allows you to use this.

cyommer commented 8 years ago

I apologize for the delay - I haven't had time to test this. My version of this is about 90% complete, but for some reason existing resources (gw instances) on upgrades aren't torn down properly, and some of the variables are being ignored. I haven't had time to test whether or not this PR suffers from the same sort of upgrade issues. I'll be returning to this soon, but I'm still a few days out, at least.

Reading back through the comments, for the question about bastion hosts, isn't that already a part of blueprint? Also, what is the redshift issue?

phobologic commented 8 years ago

So we actually ran with a template like this internally (we have a slightly divergent VPC template due to legacy issues) for about 2 weeks, though we slowly changed the # of NAT GWs we were using through that.

When we finished rolling out the the last, our team that uses Redshift was seeing lost connections. There was a lot of back and forth with AWS (originally they thought it was an MTU problem, but that didn't prove to be the case) and we found out that NAT GW's have a hard set timeout at 5 minutes that the instances did not have.

We plan to dive into this more with them in the future, but it's something worth keeping in mind.

That said, we tested the rollout of NAT GW to a NAT Instance environment, and we only had brief blips (when the route is changed all existing connections fail for obvious reasons). I'm good with this.