taskcluster / ec2-manager

Mozilla Public License 2.0
2 stars 14 forks source link

Implement new spot request model #49

Closed jhford closed 6 years ago

jhford commented 6 years ago

This PR implements the changes needed for us to use the new spot model. In this model, spot requests are made by using the on-demand runInstances endpoint with an extra parameter which provides information about how the spot request should be handled. No attempt has been made to have a clean cutover for existing spot request objects. We'll just let them exist outside of our real view as they should all disappear within a short time frame and not have too big an impact. Existing instances will continue to be tracked, so we shouldn't lose track of them.

There are some SQL schema changes. I've written a script (sql/deploy-new-spot-changes.sql) which I've tested to run and edit the database as expected.

The major changes:

  1. Removal of all handling for spot instance requests and for making calls to requestSpotInstance(s)
  2. Implementation of runInstances, with logic for handling spot requests
  3. Resulting simplification of transaction logic for instance state manipulation
  4. Temporary removal of EBS volume monitoring -- it was broken and I couldn't figure out what was going wrong. Will be readded at a later date.
  5. Case preservation of Postgres column names -- should make it easier to do queries
jhford commented 6 years ago

cc @walac: Here's a preview of the branch I'm working against right now. Feel free to take a look at the review.

jhford commented 6 years ago

Note that this is a pretty extensive patch!

jhford commented 6 years ago

@imbstack hey, do you mind taking a look at the patch while I work on the provisioner patch?

jhford commented 6 years ago

This set of patches is ready for final review. Basically, I removed the support for spot requests anywhere. Because we have an atomic call for creating instances, we can know from our database which instances belong to us. This means that we can avoid doing a lot of describeInstances API calls, and the associated performance and reliability hits.

In the new system, we don't actually need to make any describe* calls as part of tracking instance state.

There's also preliminary support for tracking the creation of terminations. This table currently does not store codes or reasons for terminations, but in the future, we'll be using the State.findTerminationsToPoll() method to do batched polling of instances using the describeInstances API to determine termination reason so that we can build the biaser on that.

Please give this a good look as it's changing a lot of fundementals of how EC2-Manager works. I'm also happy to help with any questions about it, or hop on a vidyo chat if that makes things easier.

jhford commented 6 years ago

this landed!