timgit / pg-boss

Queueing jobs in Postgres from Node.js like a boss
MIT License
2.14k stars 158 forks source link

recommendations on scaling horizontally #225

Closed bhsiaoNFLX closed 3 years ago

bhsiaoNFLX commented 3 years ago

We're running into a scaling concern with the nextJob internal query as we're reaching 1M+ jobs and 2M+ archived jobs.

Any wisdom on how to horizontally scale pg-boss DB operations with a proven track record?

Additional context

bhsiaoNFLX commented 3 years ago

It seems like pg-boss isn't cluster based out of the box (from a DB perspective) so maybe the answer is as simple as just don't hammer the DB that much but I wanted to double check regardless just in case

timgit commented 3 years ago

There are a lot of factors to consider here, but here's some general guidelines I use for our largest instances.

  1. Keep the primary job table as small as possible How large is too large for a relational table in postgres with mvcc and skip locked? If I had to pick 1 number, it would be "keep the record count under 1 million". I have only seen performance decreasing after my job tables have grown into the millions.
  2. Using batching as much as possible Think, batchSize or teamSize, which both result in fetching batches of jobs in the same fetch interval. This is very much a case by case basis, but can work very well. For example, I was able to increase job throughput 50x in one of my subscriptions by using batchSize and getting an array of jobs. In my case, each job was able to complete quickly with a shared resource instead of requiring a new resource to be acquired per job.
  3. Adjust fetch intervals as high as tolerable when using subscribe() If you have read pressure concerns from too many subscriptions, you can adjust the fetch interval upwards to reduce this. If you would like to make this variable, you can always build your own with fetch().
  4. Opt out of completion jobs This is a new addition about to be released in pg-boss 5.2.0 and gives you the ability to opt out of completion jobs by default using the onComplete config property in the constructor and/or publish(). This will make sure you have the least amount of jobs in the queue based on which ones you intend to subscribe to.
  5. Scale out workers to identify bottlenecks Depending on your workload, you may be able to increase concurrency by creating more instances connected to your database. This can potentially increase reads if all of your workers are subscribed to the same queues, but helpful if you have a backlog and it's taking too long to empty the queues and keep the record counts low.
  6. Partition by schema if needed. Finally, you can even create additional pgboss schemas in your target database, each with its own independent postgres schema. This could be used to keep really large queues segregated away from the primary queue storage so it doesn't impact performance.

In regards to Aurora, I don't think it's compatible since AWS built their own storage layer and I would guess SKIP LOCKED is not implemented. If I'm wrong about that, it's a great thing to be wrong about. :)

bhsiaoNFLX commented 3 years ago

Thank you for the detailed response. My response to each of your points (if you don't have the time, jump to 6.)

  1. Under a million. Got it.
  2. I think we do this already, for reference we use
    teamSize: 20,
    teamConcurrency: 20,
  3. When you say "fetch intervals" I'm assuming you're referring to newJobCheckInterval which we're setting to 100. Let me know if you'd know this seems too aggressive.
  4. Glad to know. We're already on 5.0.8 so once 5.2.0 is out migrating to it should be simple. So I understand it'll be a onComplete: false that we pass in which will prevent creation of the internal job rows (__state__completed__xx), ergo cutting table size by half, right?
  5. I think we do this already. For us, we have each instance pub/sub to different queues, but they all connect to the same DB instance. Explain something to me: increasing teamConcurrency increases polling back pressure while scaling out PgBoss workers also empties the queues faster. In other words it sounds like they achieve the same effect, or am I mistaken here? Great if you had an answer for this because it's been a confusing me for a while.
  6. Assuming our assumption is correct that we're truly running into DB bottleneck because we're pumping data in that fast, among these options this is the only one we can and should pursue more seriously. Can you provide more concrete pointers on how to set this up? Is Range Partitioning the right place to start? Any sample code and wisdom would be appreciated.

In regards to Aurora, I don't think it's compatible since AWS built their own storage layer and I would guess SKIP LOCKED is not implemented. If I'm wrong about that, it's a great thing to be wrong about. :)

Have you actually tried Aurora and verified pg-boss doesn't work on it? Or just speculating? Regarding SKIP LOCKED I imagine it'd work because Postgresql is Postgresql regardless of RDS v. Aurora. As long as I stick with Postgres 11.4 or above shouldn't I be just fine? SKIP LOCKED is mentioned in the Postgres 11 documentation.

timgit commented 3 years ago

When you say "fetch intervals" I'm assuming you're referring to newJobCheckInterval which we're setting to 100. Let me know if you'd know this seems too aggressive.

Setting the fetch interval to 100ms results in over-querying for new work for most use cases. It's there to be used in special cases, as in an overridden default for specific low latency subscriptions. Regardless, I can't complain too loudly here with allowed config setting, since pg-boss does allow this value. Just something to be thinking about in regards to putting read pressure on your database.

Glad to know. We're already on 5.0.8 so once 5.2.0 is out migrating to it should be simple. So I understand it'll be a onComplete: false that we pass in which will prevent creation of the internal job rows (statecompleted__xx), ergo cutting table size by half, right?

Yes. Released 5.2.0 earlier today btw.

Explain something to me: increasing teamConcurrency increases polling back pressure while scaling out PgBoss workers also empties the queues faster. In other words it sounds like they achieve the same effect, or am I mistaken here?

pg-boss uses p-map for teamConcurrency, so you could exhaust Node's event loop with really high settings here on a single instance. This is something you should experiment with to determine how your job handlers are behaving, since it's very much a case by case basis. For example, I've seen CPU util spike on a high concurrency settings in the past on some of my queues related to this.

There are actually 2 other options to consider for scaling out and concurrency.

  1. Use batchSize instead of teamSize. For example, if your use case in the handler can actually convert an array of jobs to a batch operation via Array.map(), this can drastically increase throughput and reduce downstream resource utilization.

  2. Create multiple subscriptions on the same queue. Essentially this is as easy as calling subscribe() twice for the same queue. What this does behind the scenes is set up 2 independent workers, each with their own backpressure loops.

I actually use both of these, along with independent Node proceses via docker containers, to give you more "real world" examples of combinations of hosting and techniques.

Can you provide more concrete pointers on how to set this up? Is Range Partitioning the right place to start? Any sample code and wisdom would be appreciated.

This code would create 2 independent pg-boss instances completely independent of each other in regards to storage. options in this example is whatever you're currently passing it for config

const boss1 = new PgBoss({ ...options, schema: 'pgboss1' })
const boss2 = new PgBoss({ ...options, schema: 'pgboss2' })

I imagine it'd work because Postgresql is Postgresql regardless of RDS v. Aurora.

I haven't tried it, but my understanding is that Aurora is an implementation of the postgres query layer (wire protocol), but with a completely different (and proprietary) storage engine. This is similar to other products such as cockroachdb or yugabytedb if you want to think of it like that. SKIP LOCKED was added to postgres to resolve issues with queue-like record locking problems inherent in MVCC, which these other databases don't use.

bhsiaoNFLX commented 3 years ago

Setting the fetch interval to 100ms results in over-querying for new work for most use cases.

What fetch interval would you recommend?

Yes. Released 5.2.0 earlier today btw.

woo great timing! Just upgraded today.

Use batchSize instead of teamSize.

Got it. I will recommend it to my team.

I actually use both of these, along with independent Node proceses via docker containers, to give you more "real world" examples of combinations of hosting and techniques.

Nice, didn't know about technique 2. Though if I can provide some feedback, this seems like tribal knowledge and if it's a thing for performance it should be documented or built-in. Same holds true for teamSize vs. batchSize - if one is generally better than the other performance-wise please consider documenting it :)

const boss1 = new PgBoss({ ...options, schema: 'pgboss1' })
const boss2 = new PgBoss({ ...options, schema: 'pgboss2' })

Okay please explain how this actually improves performance in the scenario where there's one target database and CPU utilization hovers near 100%. If there were two target databases that'd make sense because the CPU util. is split in half for both, but you're saying separating the schema out would yield better performance and that's the part I'm not getting.

I haven't tried it, but my understanding is that Aurora is an implementation of the postgres query layer (wire protocol), but with a completely different (and proprietary) storage engine. This is similar to other products such as cockroachdb or yugabytedb if you want to think of it like that. SKIP LOCKED was added to postgres to resolve issues with queue-like record locking problems inherent in MVCC, which these other databases don't use.

I see. That seems like bad ergonomics for Postgres. I'll have to take your word for it and revisit at a later time.

timgit commented 3 years ago

What fetch interval would you recommend?

I can't easily answer that question for you. How often would like to check for jobs? What is a reasonable latency between creating and receiving a job? Answering these questions will help out deciding this. It can be customized per subscription or a global default.

this seems like tribal knowledge and if it's a thing for performance it should be documented or built-in

Guilty. The subscribe() docs don't mention this implementation detail. That is, "subscribe() adds a new polling worker to a collection of workers". I should add this to the docs.

Same holds true for teamSize vs. batchSize - if one is generally better than the other performance-wise please consider documenting it

teamSize is more of a convenience abstraction over batchSize, where you have to work with the job array directly, manually completing and/or failing jobs. You would need to try both to determine if the extra effort required to use batchSize is worth it.

please explain how this actually improves performance in the scenario where there's one target database and CPU utilization hovers near 100%

I doubt this would help if your db server is out of resources. The only point I was trying to make is that this introduces manual partitioning for all the reasons partitioning is good for a large table.

ilijaNL commented 3 years ago

Perhaps a little bit late but good to know for others. To scale pg boss horizontally when your system is postgres performance bound is to create postgres read replicas and connect Pg boss to those and use the master slave only to publish jobs. See https://github.com/bitnami/bitnami-docker-postgresql#setting-up-a-streaming-replication for possible setup

fmmoret commented 2 years ago

wouldn't fetching a job with skip locked (+ the subsequent update) require read-locks and be master-bound?