raintank / worldping-api

Worldping Backend Service
Other
25 stars 18 forks source link

Add cluster leader election #5

Closed woodsaj closed 8 years ago

woodsaj commented 8 years ago

Issue by woodsaj Friday Jun 19, 2015 at 20:22 GMT Originally opened as https://github.com/raintank/grafana/issues/228


There are a few features within the code base that should only be run from one node at a time. This requires having the nodes co-ordinate this role amongst themselves.

Raft seems to be the new hotness when it comes to these things, so we should use that. Coreos' etcd package has an implementation of raft. https://godoc.org/github.com/coreos/etcd/raft

woodsaj commented 8 years ago

Comment by Dieterbe Tuesday Jun 23, 2015 at 00:06 GMT


1) mind sharing a little bit what those features are? 2) can we get away with transactions on the database? 3) for alerting, i noticed you mentioned somewhere running only 1 job producer, but i thought we decided we actually wanted to run multiple alert job producers for HA, because if jobs get consistently routed (by key), the consumers will drop jobs they've already processed anyway. this is a fairly simplistic method of HA. if you're thinking of running only 1 producer, and it dies and restarts somewhere else then we also need to keep track of the last timestamp at which jobs were scheduled. in case it takes several seconds to restart a producer, the new producer should also process the missed ticks from the last few seconds. (i actually like this approach, it seems more efficient, but also requires more operations/automation, perhaps we should postpone this improvement until we're at a point where multiple producers bring too much overhead?)