yougov / velociraptor

BSD 3-Clause "New" or "Revised" License
11 stars 1 forks source link

Support zero-downtime swarm restarts #109

Open ghost opened 10 years ago

ghost commented 10 years ago

Originally reported by: Brent Tubbs (Bitbucket: btubbs, GitHub: btubbs)


Right now the swarm restart feature is pretty dumb. It's driven by Javascript in the UI, which just loops over each proc in the swarm and calls restart on each as fast as it can. In practice that means that the number of simultaneously restarting procs is bound by the number of simultaneous HTTP connections you can make. For swarms with more procs than that number, we already get zero-downtime restarts because the first procs have come back up before the last procs have been taken down. For swarms with fewer procs, we have downtime because there's a second or two where all procs are down.

We could improve the swarm restart behavior by building into the backend with a little more brains behind it.


ghost commented 9 years ago

Original comment by Brent Tubbs (Bitbucket: btubbs, GitHub: btubbs):


Mathias took this on during our recent sprint and has an implementation mostly ready to go.

ghost commented 10 years ago

Original comment by Jason R. Coombs (Bitbucket: jaraco, GitHub: jaraco):


This issue could also be solved by implementing #74.

The proposal in #74 would

In my estimation, the only downside of #74 would be the inherent delay in uploading the procs.