swindon-rs / swindon

An HTTP edge (frontend) server with smart websockets support
Apache License 2.0
100 stars 9 forks source link

Fix inactivity callback in clustered setup #35

Open tailhook opened 7 years ago

tailhook commented 7 years ago

Well, I don't understand how 2dfe91a fixes the issue.

tailhook commented 7 years ago

The proposed strategy is:

  1. Sync all inactivity timers across all the replicating nodes. Probably by grouping them in bulks with 100ms - 1s latency.
  2. Split session namespace into buckets using consistent hashing. Assign 1/nth share of sessions for every node
  3. Notify about inactivity callbacks sent using technique similar to (1)
  4. Assign buckets to the next servers with the delay, i.e.:
    • buckets of server2 to server1 with the delay of 10 seconds
    • buckets of server3 to server2 with the delay of 10 seconds
    • buckets of server3 to server1 with the delay of 20 seconds, and so on
  5. Cancel calling handler if other server reports it already sent

This means: if one of the servers fails or lags too much we will delay its messages by just 10 seconds, but all inactivity callbacks are sent anyway (though, in complex failure scenarios ones can be duplicated, that's fine). And also this doesn't introduce any complex failure detection and leader election algorithms.

@popravich ?