masterpointio / terraform-aws-tailscale

Terraform module to provision a Tailscale Subnet Router into your AWS VPC
Apache License 2.0
36 stars 8 forks source link

Removal of device from Tailscale upon EC2 instance termination #22

Open oeed opened 2 months ago

oeed commented 2 months ago

Thanks again for this great module!

We have an SSM automation that automatically updates the launch template with the latest AMI image every week to automatically stay on top of Inspector alerts. As a result, over time Tailscale is left with old machines (i.e. previously terminated EC2 instances) in the list. It also means the replacement machines have a -2, -3, etc. suffix rather than the desired name.

What would be awesome is an automatic cleanup script in the launch template that automatically removes the machine from Tailscale upon termination.

Gowiem commented 2 months ago

@oeed that would be awesome and we'd be all for it. We have this same problem with our own AMI auto-update automation. Sadly, I don't know off the top of my head how we can easily solve this. Is there a termination hook script that can be executed for EC2 instances?

oeed commented 1 month ago

I haven't done it personally before, but in doing some research it does seem like ASG lifecycle management hooks would be suitable for this: https://docs.aws.amazon.com/autoscaling/ec2/userguide/lifecycle-hooks.html

It does seem fairly straight forward to declare, and then you can simply put the script in user_data.

resource "aws_autoscaling_lifecycle_hook" "termination_hook" {
  name                   = "instance-termination-hook"
  autoscaling_group_name = aws_autoscaling_group.example.name
  default_result         = "CONTINUE"
  heartbeat_timeout      = 300
  lifecycle_transition   = "autoscaling:EC2_INSTANCE_TERMINATING"
}

This is what Claude suggested, but obviously can't vouch for how accurate it is: https://claude.site/artifacts/251c7774-b974-4a5b-ad54-3582796bfc2d

Gowiem commented 1 month ago

@oeed very cool and a good use of AI. Claud does make that seem kinda simple... Little bit worried about that being the default experience, but if it was enabled by the module consumer than I'd be all for it.

Would you be interested to contribute this functionality? We can kick the tires and test it out prior to merging if so.

Gowiem commented 1 month ago

Discussing this internally at Masterpoint and @gberenice was already aware of this issue. She shared this issue from the TS provider which is worth tracking as well: https://github.com/tailscale/terraform-provider-tailscale/issues/68

Gowiem commented 1 month ago

@oeed -- @gberenice is going to add a extra_tailscaled_flags variable which will enable us to pass --state=mem: to the tailscaled command and create an ephemeral node, which solves this problem. It also comes with drawbacks... But we'll test it out internally, see how it goes, and report back when we can.

oeed commented 1 month ago

Okay great! I'll probably be freed up in about a week to contribute, so if there is something you'd like to me to do just let me know.

gberenice commented 1 month ago

@oeed I've tried to test extra flags, and came up with the conclusion, that the best way is just to set the variable ephemeral to true. You can set the flag --state=mem: and create a node without a key, but if you don't provide the key, you must authenticate by logging in at the URL provided in the logs. I believe this is not what we all want :)

Ephemeral nodes are supposed to be short-lived devices, and I have some doubts if it's a good choice for a subnet router. We're going to test it for a while on the customer of ours before we can strongly recommend this approach.