hashicorp / packer-plugin-amazon

Packer plugin for Amazon AMI Builder
https://www.packer.io/docs/builders/amazon
Mozilla Public License 2.0
75 stars 112 forks source link

When using the `most_free` subnet filter, stuck in the same subnet/availability zone even if spot capacity request fails. #425

Open teddylear opened 1 year ago

teddylear commented 1 year ago

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request. Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request. If you are interested in working on this issue or have submitted a pull request, please leave a comment.

Description

Currently when using the most_free subnet filter, most of the time if run just fine. However, certain times when this is ran and the availability zone (AZ) of the subnet has no spot instances matching your request the pack fails (if you are using spot instances for packs). And then re-runs have the same issue if the current subnet AZ still doesn't have the spot capacity, but has the most free IPs. Ideally there is some flag that allows for a number retries with the retries trying another subnet (and therefore hopefully another AZ if your other subnet is in another AZ). This way packer tries the subnet with the most free IPs, but then goes through the list of subnets if the current one fails due to capacity issues. Not sure of the best way to do this, open to any ideas and more than willing to implement as well.

Use Case(s)

above

Potential configuration

n/a

lbajolet-hashicorp commented 1 year ago

Hi @teddylear,

This looks like a good improvement for the plugin indeed! I've looked into the problem, and I fear that it may not be too trivial to implement at the moment.

The subnet list is not something we do for now, as we choose the VPC/AZ/Subnet prior to booting the instance. If we want to have fallback combinations, we should rethink the approach, so this becomes a rather large chunk of work and testing to make sure we don't break the user experience.

I tried looking into improving our chances with Spot instances by taking a look at the GetSpotPlacementScores, which we could use as another parameter for picking the VPC/AZ/Subnet, but since we only specify one instance type in the configurations, I failed to see any difference in terms of scoring, and as AWS documents, only specifying one instance always yields low scores, so this may also require changes in order to address this intelligently.

Alternatively, maybe the VPC/Subnet/AZ picking logic does not apply well for Spot instances, and we should rethink our approach for those instance types too.

I'm open to suggestions on this, and if you want to experiment with this, please feel free to do so, we'll gladly review!