rax-maas / dreadnot

deploy without dread
Apache License 2.0
631 stars 61 forks source link

recover from timeout #24

Open abh opened 12 years ago

abh commented 12 years ago

If a deployment job "hangs", there doesn't seem to be a way to recover other than restarting the process.

russellhaering commented 12 years ago

Yeah, this is a problem we need to figure out how to deal with. Much of the difficulty stems from the fact that there is no convention in node for canceling/stopping a set of related ongoing operations.

One approach would be to develop our own convention and attach a reference to every ongoing operation to the baton so that if a deployment times out (or is canceled) they can all be canceled or killed if possible.

A second approach, which seems much safer to me, is to run deployments in a subprocess (or maybe use isolates?). There are a few other potential advantages to this as well (throwing an error in a deployment wouldn't crash Dreadnot), so pursuing it is definitely high on my list of priorities.

abh commented 12 years ago

A subprocess would make sense to me. Isolates didn't make it far: http://groups.google.com/group/nodejs/msg/6b8b8a487d2ab817