solutious / rudy

Not your grandparents' EC2 deployment tool.
http://solutious.com/projects/rudy/
MIT License
89 stars 11 forks source link

Spot instances #57

Closed Fluxx closed 13 years ago

Fluxx commented 13 years ago

Here is the beginnings of the integration of spot instance requests. I'm not done with everything, I still have to add some command line options and error checking, but the meat of the functionality is there. Hope to use this pull request to get feedback and manage development of this feature before merging it back.

To enable spot instances, I create a pricing config option for your Rudyfile. Valid options are :on_demand (for normal instances) and :spot for spot instances. If you use :spot, you need to pass a block where you set the your bid price, i.e:

pricing :spot do
  bid 2.00
end

The additions to utilize spot instances are as follows:

  1. As part of the startup routine, it checks to see if spot instances are needed.
  2. If they are, it calls Rudy::Routines::Handlers::SpotRequest.create to kick off the call to the EC2 API to create the spot instance request, and then wait for the spot requests to be fulfilled (instance booted).
  3. Once the spot request has been fulfilled, the SpotRequest.create method returns an array of Rudy::AWS::EC2::SpotRequest objects, which are passed to Rudy::Machines.from_spot_request, which actually makes and saves the machines to SDB.
  4. The saved machines are then passed to RyeTools.create_set (like they would be from Rudy::Machines.create), and the startup routine goes on its merry way.

If the spot requests are never fulfilled, the user can abort the startup routine and then do whatever they like. I had thoughts about adding functionality to cancel the spot request and/or changing to on demand instances as backup, but hadn't done that yet. That's part of the "error handling" I haven't finished yet. Eager to hear what you guys think?

Also, to make it easier for me to work on the project I added a Gemfile to the project and change the shebang on the bin files to #!/usr/bin/env ruby works more universally. The previous shebang of #!/usr/bin/ruby didn't work for me (and won't for lots of other people) cause I use RVM. Happy to rebase my branch and remove the gemfile or shebang commits however, if you'd like.

delano commented 13 years ago

Looks great. I pulled it in to the main repo. Could you add an example to examples/ as well?

re: error handling, I haven't used spot instances but off hand I'd say it's probably better without the extra functionality.

Fluxx commented 13 years ago

Thanks! I'll take a stab at the writing an example. I ran the test suite on Friday and got it to run, though not all the tests passed and it surprised me that it nuked all the keypairs in my AWS account :/ I'll see if I can get a few tests written though.

Also, on the test suite, not sure if you knew or were interested, but there is this great gem called VCR that can record HTTP interactions for tests and then replay them from saved files. That helps keep fast and from needing to talk to the EC2 API and actually change state there during its run. Thought it might help here.

Fluxx commented 13 years ago

Ohhhh I just realized you meant examples of configuration as opposed to tests. I'll add that to another pull request :)

delano commented 13 years ago

Crap, sorry about the keypairs. I built the tests to run using non-production credentials. I disabled that specific test and added a message to make that more clear.

Thanks for mentioned VCR. I'll check it out.

delano commented 13 years ago

I tested it out. A couple things we'll need to take care of before releasing it:

Fluxx commented 13 years ago

When you cancel the spot request, it needs to send a CancelSpotInstanceRequestsResponseSetItemType request. Otherwise, the requests will continue and eventually be fulfilled but Rudy won't know about it (and it'll silently charge you in the background...)

Will do.

All instances are starting up in us-east-1a, regardless of the config. I think you also need to send the parameter launched_availability_zone. The instance type is always m1.small.

Hrm, that's weird. I already started some spot-based instances in our project is us-west and with other sizes (m1.large, m2.xlarge, m2.2xlarge, etc) and they started fine. I'll try reproducing with the most latest code and see what's up.

Fluxx commented 13 years ago

So I'm trying to repro the but where region/instance size is not respected from the config, and am unable to do so. I've tried a few combinations of globals vs machines config and it seems to be picking up all my changes. Can you see if you're still affected by it, and if so, send me your exact Rudyfile, revsion sha and repro steps (commands, etc) so I can see if I can repro it?

P.S. I'm gonna work on the CancelSpotInstanceRequestsResponseSetItemType fix right now, so expect a pull request for that soon.

delano commented 13 years ago

I haven't forgotten about this btw. I'm just in the midst of moving blamestella.com off of EC2. Should be able to follow up again later this week.

delano commented 13 years ago

Hey, any updates? I have more time now so let me know if you want some help with it.

Fluxx commented 13 years ago

I think things are is pretty good shape. We've been using our own fork internally for a couple weeks now and things seem to be happy. There is one bug with the spot requests. I'm making the spot requests with a :availability_zone_group tag to the EC2 API, which just tags all the machines in the group with a specific tag to guarantee that they all boot in the same AZ, not in a specific AZ. I'll need to change that to use the right option to actually use the specified AZ.