timgit / pg-boss

Queueing jobs in Postgres from Node.js like a boss
MIT License
2.04k stars 157 forks source link

Is this libary pushed or pull based queue? #246

Closed stavalfi closed 3 years ago

stavalfi commented 3 years ago

In this issue, I'm talking about creating tasks that should run in a specific date-time in the future.

https://github.com/graphile/worker is based on pulling jobs from the db. This is extrimly slow and has huge lags of 100ms to 2seconds. this is unacceptable in real time apps.

Questions

  1. Is this lib will configure the db to notify NodeJs when a job should start (without pulling to postgres)?
  2. What is the avg delay between expected-job-trigger-time to actual-job-trigger-time?

Maybe it's a good idea to put the answer to this question on your readme.

Thanks!

timgit commented 3 years ago

pg-boss is similar in architecture to AWS SQS, in that it provides durable storage in postgres along with abstractions in Node.js that allow you pull jobs off the queues by name. As a convenience, it includes a poller via subscribe(), but the only way to improve the latency on a poller is reducing the polling interval, which has the downsides of increasing read traffic. If you like everything about pg-boss except the poller, you could look into using fetch() with your own signaling implementation.

I've mentioned the SQS architecture similarity in several issues, and yes, I think I should add this to the readme. :)

stavalfi commented 3 years ago

Thanks for your fast response!

do you have fetch() implementation available?

I need to build a fast POC. If not, I will move to queues based on Redis (bull/..) but they are full of open bugs which I tried to avoid :P

Thanks!

timgit commented 3 years ago

The first thing I would try if a poller was a dealbreaker is using an EventEmitter as the signal. For example, the following code is a candidate (keep in mind I'm just typing code into a github issue and I didn't run this, but it conveys the approach I was referring to).

const EventEmitter = require('events')
const signaler = new EventEmitter()

const PgBoss = require('pg-boss')

const boss = new PgBoss('<your connection string>')

await boss.start()

const createJob = async (queue, data) => {
  await boss.publish(queue, data)
  signaler.emit(queue)
}

const onJob = async (queue) => {
  const job = await boss.fetch(queue)

  if(!job) {
    // if you have multiple listeners for distribution, this handler may not win the race
    return
  }

  if(queue === 'my-queue') {
    // todo: implement handler here for this queue
  }
}

// add as many listeners per queue as needed
signlar.on('my-queue', onJob)

// finally, create the job
await createJob('my-queue', { value: 123 })

Disclaimer: This technique has advantages and disadvantages (most architectural decisions are trade-offs). Push-based systems require some sort of monitoring to determine if a push signal was missed (in the event of a listener crash, for example). I'll leave this complexity up to the reader of course. :) This is ultimately why pg-boss and systems like SQS are pull-based, because it keeps operations as simple as possible to reason about. The single-process event emitting example above doesn't have to concern itself with scaling out to multiple services/processes, requiring the overhead of a communications system such as TCP/UDP and all the complexity it introduces.

Can you share here in this issue why your use case demands such a low latency? Usually, it's acceptable have a brief delay between reading and writing from a job queue, but I'd like to include this here for the benefit of others who may have a similar use case like this.

Thanks!

stavalfi commented 3 years ago

Thanks for your great response!

Unfortunately, as you mentioned your self, this solution does not support multi instances/ multiple readers. If worker1 create 1000 jobs, the rest of the workers won't take these jobs until the pulling happen.

Push-based systems require some sort of monitoring to determine if a push signal was missed I'll leave this complexity up to the reader of course

That's exactly the reason why we all looking for production-ready solution online. Anything but this has too much limitations for distributed systems/micro-services/push-events architectures.

Can you share here in this issue why your use case demands such a low latency? My use case is pretty simple. it involves gembling and bets. if the reader receive the signal too late (with a delay of 50ms+), the incoming signal will be invalid because: 1. there are additional computations on every signal and they take time as well. 2. there is a time when the bet becomes invalid. and it happens pretty fast (100ms after the bet-signal sent to pg-boss).

By writing my usecase, I'm thinking again if persistency/retry/multiple-readers are mandatory features for my usecase. if not, maybe im better off with web-sockets or some other low-latency transport library. Maybe even a service with a single replica and a in-memory cron-job will do it. I need to check this as well.

I do love to hear your opinion about it.

Thanks!

@amibenson @orfins

timgit commented 3 years ago

I haven't worked on a system with a low latency requirement like this, but it sounds like you'd want to shop for a system written specifically with low latency read/writes. I don't think postgres is going to be the best choice for this on my opinion.

However, once a bet is won and you have async processing requirements, you could use job queues for those.