sprinkle-tool / sprinkle

Sprinkle is a software provisioning tool you can use to build remote servers with. eg. to install a Rails, or Sinatra stack on a brand new slice directly after its been created
https://github.com/sprinkle-tool/sprinkle
MIT License
1.15k stars 138 forks source link

Feature: provide optional threading on ssh actor #155

Closed troelskn closed 11 years ago

troelskn commented 11 years ago

This change enables the ssh actor to run commands against multiple hosts in parallel. I took the main idea from here: https://github.com/sprinkle-tool/sprinkle/pull/49 which was never followed up on.

The change is backwards compatible in that one have to explicitly enable it and it only takes effect if there are more than 1 target host. I've tested it and it seems to run fine. Speedups are as expected - there are no noticeable difference between addressing one or many nodes. (To a point, I'm sure)

Sorry about the whitespace changes - I kept them in a separate commit.

joshgoebel commented 11 years ago

Not going to accept this now. Either SSH needs to work just like Capistrano (and do multiple sessions WITHOUT threads). Or we need to decide the right direction for both SSH and Capistrano is to go multi-threaded and then support advanced features by doing so. The fact is threads complicate things and it's possible to do what we're already doing without threads (see Capistrano). IE, how does this work with 100 hosts? What happens when one throws an error but not the others?

The big reason to go multi-threaded would to be to allow the packages to actually execute real-time per-server... until we are doing that I don't see the additional complexity worth it, even if it's configurable on/off.

troelskn commented 11 years ago

You probably have better insight in the tradeoffs here than I, but I'm not clear on your last paragraph. I tried this out and it does indeed execute much faster that way. Or what do you mean exactly by "allow the packages to actually execute real-time per-server"?

joshgoebel commented 11 years ago

I mean allow the packages to have code like (bad example):

if server.name == "bigdb"
 runner "custom code for only bigdb"

Unless we can do that multi-threaded is NOT necessary... the same thing can be achieved by just managing the IO on multiple SSH connections properly (like Capistrano does).

joshgoebel commented 11 years ago

I've long thought that perhaps the SSH actor should die and be replaced by Capistrano internally... but not sure Cap supports ALL the options we currently support with SSH.

troelskn commented 11 years ago

Ah. So basically, currently packages aren't thread safe, so weird stuff may happen if the packages have conditionals? I'll have a look at how Capistrano does it and see if it's something I could patch together as the speedup makes a big difference to my setup.

Re replacing ssh with Capistrano, I think it's worthwhile to have a self-contained actor. It's really easy to figure out what's going on now - Having to dig through Capistrano could complicate things a bit for debugging.

joshgoebel commented 11 years ago

currently packages aren't thread safe

It's not that... a package should be thought of as a definition, not Ruby code... they are "statically compiled" and then executed on a bunch of servers.. they aren't "run" in the context of each server (as some people assume). As long as they are statically compiled then Ruby multi-threading is NOT needed. Unless this changes the proper way to have reliable multiple SSH sessions is to simply open them in a queue/loop and manage the IO. It's easy to talk to 100 servers over SSH in Ruby without any threads at all.

I think it's worthwhile to have a self-contained actor. It's really easy to figure out what's going on now

I'm talking from the point of view of end users... They shouldn't have to know that there are small changes in behavior between actors... or that things will change if they switch actors... originally we have support for people who have their setups done differently... some people already have hosts, etc defined in Capistrano, so lets use that... but there is no reason we couldn't ALWAYS use Capistrano as a dependency even for raw SSH and stop talking to the SSH library directly.

One big-win would be you'd instantly get simultaneous execution - since that is a built-in feature of Capistrano.

troelskn commented 11 years ago

OK - Thanks for the elaboration.

As for using Capistrano exclusively, the simplification of eliminating choices is certainly also a feature worth considering. I guess I'll look at using the Capistrano actor instead and see how that goes.

joshgoebel commented 11 years ago

Capistrano also has built in support for answering the sudo password prompt, etc... :) Give it a shot... I'm curious what you have to change to make it work... shouldn't be much.

troelskn commented 11 years ago

I did and it was surprisingly easy to switch. One thing that isn't very clear from the documentation is that even when using Capistrano, you don't actually have to write capfiles etc. - it's really just the low level libraries that are reused. This was part of why I aimed for using the ssh actor to begin with.

troelskn commented 11 years ago

Closing this as it is the wrong way to approach this.