Closed jordaaash closed 3 years ago
I totally want to jump on the listen/notify train for reducing polling, but I don't see how it could successfully pull off clustered or distributed workers without a mess of duplicate jobs.
Hmm, okay. I don't know much about this, so here's how I imagine it would go. The workers would receive a notification and race to obtain a row lock in the transaction and update the row. The first worker to update the status to active
will show the row updated, the other workers would update without conflict. Will the others see the rows updated count as 0 or 1?
I don't know much about it either. It would be good to see a prototype and throw some use cases at it so I can develop a more informed opinion. 😁
Will the others see the rows updated count as 0 or 1?
What does this mean?
So the way I understand these lines is that the first worker to obtain a lock would update the row, with the response UPDATE 1
. Other workers that select the row would either fail to obtain the lock, or maybe update the row after it had been updated. I'm wondering if this could be combined with a RETURNING
clause:
If the UPDATE command contains a RETURNING clause, the result will be similar to that of a SELECT statement containing the columns and values defined in the RETURNING list, computed over the row(s) updated by the command.
So if no rows are updated (meaning, the status was already marked active
by another worker), then the worker should assume another worker is handling it.
Another way to do this would be to insert job statuses into a separate table with a constraint on the job id and status. The first worker to insert the row in the transaction wins, and then processes the job.
Yes, FOR UPDATE locks the next job to whoever wins the race to get it. SKIP LOCKED informs all losers to look elsewhere for a job so they won't try to lock the same record. RETURNING just consolidates the fetch of the acquired job so it doesn't require another query to get it.
I'm going to close this out for now. fetch()
and complete()
offers a polling alternative to subscribe()
that may appease the masses for this type of architectural request.
I also have no idea how to use listen / notify in case of subscribe with option 'startAfter'. Polling is a need.
what's the state?
can refer: https://github.com/graphile/worker
can do this: insert into pgboss.job(name, data, startAfter, singletonKey, singletonon) VALUES('some-queue', '{}', now() + INTERVAL '3s', '22', now() + INTERVAL '1d');
I tried to add listen/notify to pg-boss and I think it's possible.
The idea is simple: when the job with startAfter <= now (i. e. not a delayed job) is created, the NOTIFY event is emitted via the trigger. The pg-boss workers are LISTENs to these events and performs fetch outside the regular polling cycle. The polling cycle is still preserved for the delayed jobs.
@timgit, could you look at this code 51771b6? It is not a full solution (I didn't write a database migration, for example), but all tests are passes.
Thanks for the reference prototype here! Have a look at #93 for a related discussion. There are trade-offs to consider when using listen/notify in pooled or contentious systems, as well as systems that may already be using listen/notify.
I think the spirit of the request for notify is "I don't want to overload the database with too many requests". If that is the case, this is an optimization to reduce reads so polling can be completely turned off and replaced with a "push" request to tell a subscriber to pick up a job. There are distribution and backpressure concerns that need to be accounted for in a push system which make this a bit harder of a problem to solve than it seems at first.
My first recommendation is to profile your system and adjust polling intervals (even at the per-queue level) to whatever makes the most sense for your application. This usually addresses 99% of the concerns of over-fetching.
I think the spirit of the request for notify is "I don't want to overload the database with too many requests".
That is only part of it. LISTEN/NOTIFY
allows to reduce the latency. I do see the point though.
One option could be to have listen/notify (maybe optional) but don't change anything of the rest of the system. Keep polling. But if you receive a notification, just try to have an extra fetch right away.
With this approach, I think the system can remain compatible as it is, but it would allow users to specify longer polling configurations (for the startAfter
and other cases), because workers would try to pick up jobs as they are created.
Could you use generic wal messages for job tracking and achieve these goals with logical decoding?
What do you think about using triggers with LISTEN/NOTIFY rather than polling for new jobs? I'm testing this out in our application currently.
Refs
https://github.com/brianc/node-postgres/wiki/Client#event-notification http://bjorngylling.com/2011-04-13/postgres-listen-notify-with-node-js.html https://blog.andyet.com/2015/04/06/postgres-pubsub-with-json/