Open Jmgr opened 3 years ago
Yes, this is something I've thought a bit about, especially as it relates to rolling cluster upgrades. I think a graceful shutdown would make sense. There would be a few components to this:
ChangeLeaderOp
in Raft) and remove self from ISR (ShrinkISROp
). This should be down gradually to avoid a flood of Raft ops. Also interrupt any clients currently subscribed.ShrinkISROp
). This should be done gradually to avoid a flood of Raft ops. Also interrupt any clients currently subscribed.
Currently, when a Liftbridge server shuts down it stops being a leader for its partitions. If many partitions exist that will result in a flurry of Raft events. Would it be possible to trigger a progressive shutdown to prevent this? Have you had some thought about this @tylertreat?