Slow queries at 3 million+ records

hjr3 commented 3 years ago

Hi,

First, thank you for the great package.

We have over 3 million records in our pgboss.job table. Here are the counts by job name (names obfuscated).

job	count
statecompleted__foo	529784
foo	529577
bar	349269
statecompleted__bar	343232
statecompleted__baz	168809
statecompleted__qux	168710
qux	168700
baz	168539
statecompleted__qaz	129826
qaz	129583

I started noticing database performance issues and eventually focused in on the query from fetchNextJob. I simplied the query to

SELECT id
FROM pgboss.job
WHERE state < 'active'
  AND name LIKE 'foo'
  AND startAfter < now()
ORDER BY priority desc, createdOn, id
LIMIT 1
FOR UPDATE SKIP LOCKED
;

and at times it could take over 1 second to complete. I ran explain/analyze on the query and got:

Limit  (cost=107590.69..107590.70 rows=1 width=34) (actual time=1363.154..1363.156 rows=1 loops=1)
  ->  LockRows  (cost=107590.69..107866.55 rows=22069 width=34) (actual time=1363.153..1363.155 rows=1 loops=1)
        ->  Sort  (cost=107590.69..107645.86 rows=22069 width=34) (actual time=1363.056..1363.058 rows=2 loops=1)
              Sort Key: priority DESC, createdon, id
              Sort Method: quicksort  Memory: 48kB
              ->  Bitmap Heap Scan on job  (cost=45080.36..107480.34 rows=22069 width=34) (actual time=1150.879..1362.812 rows=305 loops=1)
                    Recheck Cond: (state < 'active'::pgboss.job_state)
                    Filter: ((name ~~ 'foo'::text) AND (startafter < now()))
                    Rows Removed by Filter: 35990
                    Heap Blocks: exact=33177
                    ->  BitmapAnd  (cost=45080.36..45080.36 rows=22071 width=0) (actual time=919.122..919.124 rows=0 loops=1)
                          ->  Bitmap Index Scan on job_state  (cost=0.00..7692.55 rows=125883 width=0) (actual time=280.630..280.630 rows=359253 loops=1)
                                Index Cond: (state < 'active'::pgboss.job_state)
                          ->  Bitmap Index Scan on job_name_default  (cost=0.00..37376.52 rows=524262 width=0) (actual time=595.933..595.933 rows=612506 loops=1)
                                Index Cond: (name = 'foo'::text)
Planning Time: 0.271 ms
Execution Time: 1363.934 ms

I created the following index:

CREATE INDEX CONCURRENTLY job_state_name_start_idx ON pgboss.job (
  state,
  name text_pattern_ops,
  startafter
);

and query times are now below 20ms. Here is the new explain/analyze:

Limit  (cost=68306.19..68306.20 rows=1 width=34) (actual time=16.486..16.489 rows=1 loops=1)
  ->  LockRows  (cost=68306.19..68584.02 rows=22227 width=34) (actual time=16.485..16.487 rows=1 loops=1)
        ->  Sort  (cost=68306.19..68361.75 rows=22227 width=34) (actual time=16.465..16.467 rows=1 loops=1)
              Sort Key: priority DESC, createdon, id
              Sort Method: quicksort  Memory: 53kB
              ->  Bitmap Heap Scan on job  (cost=5398.89..68195.05 rows=22227 width=34) (actual time=15.109..16.196 rows=371 loops=1)
                    Recheck Cond: ((state < 'active'::pgboss.job_state) AND (startafter < now()))
                    Filter: (name ~~ 'foo'::text)
                    Heap Blocks: exact=382
                    ->  Bitmap Index Scan on job_state_name_start_idx  (cost=0.00..5393.33 rows=22227 width=0) (actual time=15.027..15.028 rows=822 loops=1)
                          Index Cond: ((state < 'active'::pgboss.job_state) AND (name = 'foo'::text) AND (startafter < now()))
Planning Time: 0.291 ms
Execution Time: 16.528 ms

Are you open to a PR that adds such an index? I also wonder if there is some optimization that can be done to combine job_name index and job_state_name_start_idx.

timgit commented 3 years ago

Thanks for the research and the candidate index. I'll take a look at this. Definitely open to indexing PRs.

timgit commented 3 years ago

I haven't been able to reproduce your before/after yet with the following data load script


insert into pgboss.job (name)
select 'q' || generate_series::text
from generate_series(1, 5000000);

I'm testing using postgres 12.6 on ubuntu to verify and I see a different query plan being used. It may be related to the name distribution here since the queues are more random, but I'd like to find out if you see the same thing.

For example, after running the above load query twice, I have 2 jobs per queue and 10mil recs.

query

explain analyze
SELECT id
FROM pgboss.job
WHERE state < 'active'
  AND name LIKE 'q20'
  AND startAfter < now()
ORDER BY priority desc, createdOn, id
LIMIT 1 
for update skip locked;

produces


Limit  (cost=3.78..3.80 rows=1 width=34) (actual time=0.056..0.058 rows=1 loops=1)
--
->  LockRows  (cost=3.78..3.81 rows=2 width=34) (actual time=0.056..0.056 rows=1 loops=1)
->  Sort  (cost=3.78..3.79 rows=2 width=34) (actual time=0.035..0.035 rows=1 loops=1)
Sort Key: priority DESC, createdon, id
Sort Method: quicksort  Memory: 25kB
->  Index Scan using job_singletonkeyon on job  (cost=0.43..3.77 rows=2 width=34) (actual time=0.023..0.027 rows=2 loops=1)
Index Cond: (name = 'q20'::text)
Filter: ((state < 'active'::pgboss.job_state) AND (name ~~ 'q20'::text) AND (startafter < now()))
Planning Time: 0.168 ms
Execution Time: 0.081 ms

hjr3 commented 3 years ago

I think this data load script is too simple. I initially thought unique names and many jobs in a completed state would reproduce the issue, but it did not. Let me try and produce a load script that reproduces what we are seeing. If I cannot I may dump our table (without the data) and share that.

NikolaLukic1 commented 3 years ago

Hello. I'm having the same issue here. I did explain analyze too with the same query, on 14M records inside the DB. It took 30 seconds.

Also when trying to do count, it takes forever.

SELECT "job"."state" AS State, COUNT(*) AS count 
FROM "pgboss"."job" "job"
WHERE name IN ($1, $2) GROUP BY "job"."state"

@hjr3 When i've added the index like you did, it didn't really help in my case.

I even set onComplete to false so i don't have statecompleted__ prefixed job inside the db.

timgit commented 3 years ago

How many different queues do you have in the 14mil records? If possible, you may want to increase the amount of workers and/or manually partition your queues into dedicated instances (schemas) to keep the record count as low as possible per queue.

It's hard to predict a perfect index across all use cases, since the queue distribution would vary based per installation. For example, in my app, I have thousands of distinct queues, but others may only have a few.

hjr3 commented 3 years ago

We continued to see slow queries for jobs that was putting a lot of pressure on our database. We tried to get more aggressive with the archiving of jobs: archiveCompletedAfterSeconds: 180. This was not working though. It finally dawned on me that the total number of rows in the job table was not the real issue. The real issue we were having is that we had millions of active jobs:

79230   created
2777895 active
726 completed
7   failed

The vast majority of active jobs started with statecompleted__. I noticed the maintenance jobs were not running. This is because the existing maintenance job was stuck in an active state. This meant that new maintenance jobs could not be created due to the singleton key on that job. I manually cleaned up the amount of jobs in active state and got the maintenance job running again. I also set up proper monitoring of the states with something like:

boss.on('monitor-states', (monitorStates) => {
   if monitorStates.active > 10_000 {
      log.error('pg-boss Alert - state sizes are too big');
   }
});

If we ever see this error, we will check to make sure the maintenance job is running again.

We continued to keep our aggressive archive setting (we log our job results separately). Here is where we now stand with the pgboss.job table:

1371    created
847 active
337 completed
394 expired

We have also started setting newJobCheckInterval for some queues. For example: scheduled jobs that run a few times per hour might have a newJobCheckInterval of 5 minutes as it is not critical that the job runs exactly on the hour. As the number of our queues continues to grow, this has reduced the number of queries for new jobs.

We have now gone a few weeks without any problems.

NikolaLukic1 commented 3 years ago

The way i optimised the Db is that is set the isCompleted to false. So there are no _statecompleted prefixed job at all at the db. The problem is that many jobs are scheduled to 1-2 hours and they stack up, so i separated those from immediately running jobs in separate queues.

Also i've set maintenance and archive time to 60s so db is pretty empty and performance is way better. @hjr3 Thanks for sharing other optimising options!

Still, an optimised index would be great to help on worst case scenarios.

jede commented 3 years ago

First of all, thank you @timgit for a fantastic tool! We've been running pg-boss in production for about 5 years now with very good results!

We had a very similar issue to the one described above with about 1.7M records in the job table. The query for fetching jobs took about a minute to run. We tried vacuuming but with no success. After removing all __state__completed__ jobs the query was running a bit faster, but still at about 12 s.

          explain analyze SELECT id
          FROM pgboss.job
          WHERE state < 'active'
            AND name LIKE 'main/%'
            AND startAfter < now()
          ORDER BY priority desc, createdOn, id
          LIMIT 12
          FOR UPDATE SKIP LOCKED;

Limit  (cost=116360.11..116360.15 rows=12 width=34) (actual time=12784.713..12784.724 rows=12 loops=1)
  ->  LockRows  (cost=116360.11..117489.27 rows=322618 width=34) (actual time=12437.354..12437.364 rows=12 loops=1)
        ->  Sort  (cost=116360.11..116521.42 rows=322618 width=34) (actual time=12437.182..12437.185 rows=48 loops=1)
              Sort Key: priority DESC, createdon, id
              Sort Method: quicksort  Memory: 514kB
              ->  Seq Scan on job  (cost=0.00..114880.92 rows=322618 width=34) (actual time=7804.633..12363.510 rows=4120 loops=1)
                    Filter: ((state < 'active'::pgboss.job_state) AND (name ~~ 'main/%'::text) AND (startafter < now()))
                    Rows Removed by Filter: 606094
Planning Time: 0.216 ms
JIT:
  Functions: 5
  Options: Inlining false, Optimization false, Expressions true, Deforming true
  Timing: Generation 0.934 ms, Inlining 0.000 ms, Optimization 0.420 ms, Emission 346.785 ms, Total 348.139 ms
Execution Time: 12785.724 ms

In our case a combination of removing all __state__completed__ jobs, setting onComplete: false (we're not using this functionality currently), setting archiveCompletedAfterSeconds to 2 hours and adding the index as described by @hjr3 seems to have solved it.

We did not have a lot of jobs in state: 'active' like @hjr3, our maintanance jobs were working fine.

I think the main issue for us was that the __state__completed__ jobs where kept for 30 days since no one consumed them. Also we upgraded postgres to 13.2 recently, it could be related, but I'm not sure its related.

Let me know if I can be of any help with debugging this!

timgit commented 3 years ago

I'm planning on attaching a filtered version of this index (... WHERE state < active) in the next major release. I will close this once merged and published.

timgit / pg-boss

Slow queries at 3 million+ records #228