timgit / pg-boss

Queueing jobs in Postgres from Node.js like a boss
MIT License
2.07k stars 157 forks source link

Timescaledb #354

Closed ilijaNL closed 1 year ago

ilijaNL commented 1 year ago

Hey there, thanks for this great library.

I am looking into to optimize the query performance of the jobs table and wonder if someone (@timgit ) did expirement with timescaledb. I think if you make some assumptions about how long a job can take and when it expires, you can improve the performance by a large amount when choosing correct chunks.

Additionally, these assumptions can be used now to optimize the fetchNextJob query, e.g.

WITH nextJob as (
      SELECT id
      FROM ${schema}.job
      WHERE state < '${states.active}'
        AND name LIKE $1
        AND startAfter < now() AND startAfter > now() - interval '2 days' -- assume jobs cannot be older than 2 days 
      ORDER BY priority desc, createdOn, id
      LIMIT $2
      FOR UPDATE SKIP LOCKED
    )
    UPDATE ${schema}.job j SET
      state = '${states.active}',
      startedOn = now(),
      retryCount = CASE WHEN state = '${states.retry}' THEN retryCount + 1 ELSE retryCount END
    FROM nextJob
    WHERE j.id = nextJob.id
    RETURNING ${includeMetadata ? 'j.*' : 'j.id, name, data'}, EXTRACT(epoch FROM expireIn) as expire_in_seconds

Any thoughts?

timgit commented 1 year ago

Hey there. We're actually using timescale for some of our production workloads, but definitely not for pg-boss managed queue tables. First of all, there are features that won't work without unique constraints spanning tsdb chunks (tables). Timescale resolves scalability use cases over very large data sets. You would have issues in pg-boss far before you reach the scale where a partitioning solution like timescale would offer significant benefits. If you find yourself needing millions of jobs and you're interested in partitioning, I recommend manually partitioning using independent schemas in pg-boss to split apart the high-volume queues from each other.

ilijaNL commented 1 year ago

I see, the singleton constraints won't work. Hmm perhaps for less feature rich postgres job queue it can be beneficial to use timescaledb if you have strong assumptions about the time.

Offtopic I wonder what software you are developing which use timescaledb and pgboss

timgit commented 1 year ago

We use timescale for our time series workloads, not for pg-boss queues.

ilijaNL commented 1 year ago

I think you misunderstood me. I meant, what company are you working and what kind of software you are developing (with help of pgboss and timescaledb)

timgit commented 1 year ago

Aha. I'm working on the Datto EDR product atm.