sebdah / aws-ec2-assign-elastic-ip

Automatically assign Elastic IPs to AWS EC2 Auto Scaling Group instances
Apache License 2.0
175 stars 58 forks source link

Race condition results in EIP association failing #24

Closed brodygov closed 6 years ago

brodygov commented 6 years ago

The EC2 API for associating EIPs appears to be eventually consistent. So if two servers attempt to grab the same EIP at the same time, one of them will succeed and the other will fail but think it succeeded.

Example logs from this happening, with the IP and instance IDs redacted.

2017-11-27 19:50:02,225 - aws-ec2-assign-eip - INFO - Connected to AWS EC2 in us-west-2
2017-11-27 19:50:02,668 - aws-ec2-assign-eip - INFO - Successfully associated Elastic IP x.x.x.105 with i-a77fbf
2017-11-27 19:50:02,064 - aws-ec2-assign-eip - INFO - Connected to AWS EC2 in us-west-2
2017-11-27 19:50:02,909 - aws-ec2-assign-eip - INFO - Successfully associated Elastic IP x.x.x.105 with i-8d7ed1

In reality, i-8d7ed1 won the race and has the EIP associated, while i-a77fbf thinks it got the EIP but has no EIP associated.

One way to fix this would be to loop doing the describe-addresses call until the association with the current instance ID appears. Would you be interested in a PR that does this?

brodygov commented 6 years ago

Following up here with discussion from the PR:

This issue is actually more like a bug/wart in boto2, which doesn't set the AllowReassociation parameter and doesn't provide any way to explicitly set that parameter to false. Without it, AWS defaults to true, allowing an EIP to be reassociated even if it is currently associated with another instance.

https://github.com/ab/boto/commit/239013bfef349763ac9ef73cc67bba0e0eb4adda is a start at fixing this for boto 2.

kylemacfarlane commented 6 years ago

I think the billing issue I said could theoretically exist in the PR thread does already exist.

Imagine two instances fire up due to a scaling event and both have a cron to check they have an Elastic IP every minute. Once booted they will both race for the same Elastic IP as shown in @brodygov's log.

With an EC2-Classic account (pre-2014) only one instance will manage to associate and the other will error. This will incur a $0.10 charge. In the second minute the second instance will associate and incur another $0.10 charge. Total charges are $0.20.

With an EC2-VPC-only account (2014+) both instances could successfully associate in the first minute and incur $0.20 of charges. In the second minute the instance that was stolen from will associate again and incur another $0.10 charge. Total charges are $0.30.

Not a big difference but it escalates quickly. If the bad luck was repeated with 10 instances it would take 10 minutes to properly associate every instance and the difference by then is $1.00 vs $5.50.

I would definitely upgrade to boto 3 and explicitly set AllowReassociation to False for everyone.

If requested features that would require reassociation (e.g. #13 or #19) are ever implemented then they should be done so carefully.

wilerson commented 6 years ago

As this issue affected my use case, I went ahead and made a PR - If #13 or #19 are ever implemented, the implementers could add an argument for AllowReassociation.

sebdah commented 6 years ago

The above PR is merged and released with 0.8.0. Please reopen this issue if needed.

Thanks @wilerson for the PR.