how to customize/disable leadership election?

elee1766 commented 6 months ago

in our system, we have processes that are not dynamically scaled, and others that are dynamically scaled.

processes which run tasks that do not scale on demand can therefore restart less, and can have different caching rules as they know their caches are longer lived.

processes which run tasks that do scale on demand are dynamically scaled up to meet demand, and so they often restart multiple times a day during traffic spikes.

it would be great if we could make it so that our dynamically scaled processes don't participate in leader election, and always assume they are slaves. This would result in less points in time where we lose our leader and need to do re-election.

ultimately, it would be nice to provide our own election mechanism, as we could then deduplicate some work, and also provide custom election mechanisms like "always-follower" for our dynamically scaled processes.

bgentry commented 6 months ago

Are you encountering some kind of issue with the leader election switches that’s prompting this concern? The system should be pretty robust, and only restarting several times per day (instead of like every couple seconds) should not cause any problems whatsoever. It would probably considered a bug if there were any issues caused by leader election switching every few hours.

elee1766 commented 6 months ago

Are you encountering some kind of issue with the leader election switches that’s prompting this concern? The system should be pretty robust, and only restarting several times per day (instead of like every couple seconds) should not cause any problems whatsoever. It would probably considered a bug if there were any issues caused by leader election switching every few hours.

hmmm ok yeah, there is a bit more nuance here. we are currently migrating our current job system to river, and so i'm trying to replicate some existing behavior

so basically the task that is run by the leader is something like a summary which runs a few times a second. It has an in-memory cache to avoid lots of recalculations, and so repeated invocations of this periodic task are fast if they are run on the same node, but on the first run on a new node, the first summary can take longer than a few seconds, which is not good because we are running the summary on 250ms intervals. as a result, there a few seconds lag in ui that is supposed to update at least once per second, so even when our leadership re-election was basically instant, we would still have issues.

the original problem was actually these giant summary jobs used to be scheduled on arbitrary workers,but it would cause our workers to use too many resources, so we ended up creating an election such a single "leader node" with failover 'leader candidates' would be in charge of processing these jobs. that way we could separate in deployments high resource leader candidates and low resource task runners. river is quite appealing because you guys basically have this feature built in, and doesn't need to be hacked on top of our already rather hacky job queue we are migrating from

so i think we can simulate the behavior in river through having some clients configured to stream, and having the others poll with a very long interval, but it seems rather hacky. if you are confident that the system will function correctly in that case, then maybe its okay.

another possible solution would be to use a completely different search path and a different database for these heavy duty periodic tasks, but that would require a good amount of changes in our code (maybe it is worth it)

bgentry commented 6 months ago

Gotcha, this sounds like something River isn’t really designed for today, at least not in the way you’re trying to do it. I had thought you were referring to River’s internal leader election and the internal services that depend on it (job cleaner, scheduler, etc). Particularly the fact there’s no real concept of node affinity for a job—jobs are meant to be somewhat stateless and idempotent and able to be processed by any client working a particular queue.

The good news is there might be a few options! The most obvious solution that comes to mind: would you be able to put these stateful jobs on their own named queue (call it stateful) and only set up clients to work that queue from the nodes which will stick around and which you want to schedule these long-running tasks on? Other jobs can stay on a more general purpose queue that all clients are able to work.

It also sounds like you’re looking to ensure that only one of these summary jobs is running at a given moment. This is also something there’s no official support for, though you can hack it in with your own locking mechanism. However it’s something we have thought about and hope to implement.

Another consideration is that River already has a solid leader election system and a setup for running tasks only when the Client is the leader, but this isn’t exposed directly today. That’s something we’ve thought about too, but not sure how high it will be on our list.

elee1766 commented 6 months ago

Gotcha, this sounds like something River isn’t really designed for today, at least not in the way you’re trying to do it. I had thought you were referring to River’s internal leader election and the internal services that depend on it (job cleaner, scheduler, etc). Particularly the fact there’s no real concept of node affinity for a job—jobs are meant to be somewhat stateless and idempotent and able to be processed by any client working a particular queue.

The good news is there might be a few options! The most obvious solution that comes to mind: would you be able to put these stateful jobs on their own named queue (call it stateful) and only set up clients to work that queue from the nodes which will stick around and which you want to schedule these long-running tasks on? Other jobs can stay on a more general purpose queue that all clients are able to work.

It also sounds like you’re looking to ensure that only one of these summary jobs is running at a given moment. This is also something there’s no official support for, though you can hack it in with your own locking mechanism. However it’s something we have thought about and hope to implement.

Another consideration is that River already has a solid leader election system and a setup for running tasks only when the Client is the leader, but this isn’t exposed directly today. That’s something we’ve thought about too, but not sure how high it will be on our list.

i was under the impression that PeriodicJobs would only be run by the leader? https://riverqueue.com/docs/periodic-jobs.

rn the job is setup to run 4 times a second and will only calculate when it finds there are more things to calculate.

i was planning to leverage PeriodicJobs being only run by the leader, which is what led me to look to see if i could modify/restrict leadership in the election to a subset of candidates.

bgentry commented 6 months ago

i was under the impression that PeriodicJobs would only be run by the leader?

Not quite, periodic jobs are only enqueued or inserted by the leader; once enqueued they are run normally across all Clients working a given queue, just like any other job.

elee1766 commented 6 months ago

i was under the impression that PeriodicJobs would only be run by the leader?

Not quite, periodic jobs are only enqueued or inserted by the leader; once enqueued they are run normally across all Clients working a given queue, just like any other job.

ahh i see. ok. I will try playing around then with a queue tied to a process and see what happens then.

riverqueue / river

how to customize/disable leadership election? #336