rails / solid_queue

Database-backed Active Job backend
MIT License
1.95k stars 130 forks source link

Possibility of making `SolidQueue::Job.clear_finished_in_batches` more performant? #403

Open salmonsteak1 opened 2 weeks ago

salmonsteak1 commented 2 weeks ago

Hey there, I was analyzing some queries that is being done on my solid queue DB and I came across this:

DELETE
FROM
  "solid_queue_jobs"
WHERE
  "solid_queue_jobs"."id" IN (
  SELECT
    "solid_queue_jobs"."id"
  FROM
    "solid_queue_jobs"
  WHERE
    "solid_queue_jobs"."finished_at" IS NOT NULL
    AND "solid_queue_jobs"."finished_at" < $1
  LIMIT
    $2)

I believe it relates to the SolidQueue::Job.clear_finished_in_batches code, where I have been using it like so in a recurring job:

SolidQueue::Job.clear_finished_in_batches(finished_before: <some_time>)

I also did an EXPLAIN QUERY on this query, and I seems like it's doing a sequential scan on all rows in solid_queue_jobs which are older than the time specified in finished_before:

QUERY PLAN
Delete on solid_queue_jobs (cost=84.52..1473.01 rows=0 width=0) (actual time=3.724..3.725 rows=0 loops=1)
-> Nested Loop (cost=84.52..1473.01 rows=500 width=38) (actual time=0.383..2.057 rows=500 loops=1)
-> HashAggregate (cost=83.96..88.96 rows=500 width=40) (actual time=0.374..0.449 rows=500 loops=1)
Group Key: "ANY_subquery".id
Batches: 1 Memory Usage: 105kB
-> Subquery Scan on "ANY_subquery" (cost=0.00..82.71 rows=500 width=40) (actual time=0.012..0.287 rows=500 loops=1)
-> Limit (cost=0.00..77.71 rows=500 width=8) (actual time=0.005..0.225 rows=500 loops=1)
-> Seq Scan on solid_queue_jobs solid_queue_jobs_1 (cost=0.00..621188.83 rows=3996682 width=8) (actual time=0.004..0.201 rows=500 loops=1)
Filter: ((finished_at IS NOT NULL) AND (finished_at < now()))
-> Index Scan using solid_queue_jobs_pkey on solid_queue_jobs (cost=0.56..2.77 rows=1 width=14) (actual time=0.003..0.003 rows=1 loops=500)
Index Cond: (id = "ANY_subquery".id)
Planning Time: 0.723 ms
Trigger for constraint fk_rails_318a5533ed: time=2.681 calls=500
Trigger for constraint fk_rails_39bbc7a631: time=2.883 calls=500
Trigger for constraint fk_rails_4cd34e2228: time=2.044 calls=500
Trigger for constraint fk_rails_81fcbd66af: time=2.731 calls=500
Trigger for constraint fk_rails_9cfe4d4944: time=2.608 calls=500
Trigger for constraint fk_rails_c4316f352d: time=2.897 calls=500
Execution Time: 19.997 ms 

More information from query analysis on CloudSQL:

image

Is it possible to simplify this query, or would there be a timeline for when clear_finished_jobs_after will happen automatically?

rosa commented 2 weeks ago

Hmm... interesting that you get that query. That code results in the following query for us:

DELETE FROM `solid_queue_jobs` 
WHERE `solid_queue_jobs`.`finished_at` IS NOT NULL 
AND `solid_queue_jobs`.`finished_at` < '2024-11-04 19:42:00.055618' 
LIMIT 1000;

that simply uses the index index_solid_queue_jobs_on_finished_at. Is this PostgreSQL?

salmonsteak1 commented 2 weeks ago

@rosa yes, we're using PostgreSQL

rosa commented 2 weeks ago

Ahh, I realised why this is the case. PostgreSQL doesn't support LIMIT on DELETE queries 😅 I don't have a good alternative for this one at the moment, I'm afraid. I'm not sure from reading the output from your EXPLAIN QUERY, but doesn this mean that the index on finished_at is not being used for the inner query?

Seq Scan on solid_queue_jobs solid_queue_jobs_1 (cost=0.00..621188.83 rows=3996682 width=8) (actual time=0.004..0.201 rows=500 loops=1)
Filter: ((finished_at IS NOT NULL) AND (finished_at < now()))
salmonsteak1 commented 2 weeks ago

Ahh, I realised why this is the case. PostgreSQL doesn't support LIMIT on DELETE queries 😅 I don't have a good alternative for this one at the moment, I'm afraid. I'm not sure from reading the output from your EXPLAIN QUERY, but doesn this mean that the index on finished_at is not being used for the inner query?

Seq Scan on solid_queue_jobs solid_queue_jobs_1 (cost=0.00..621188.83 rows=3996682 width=8) (actual time=0.004..0.201 rows=500 loops=1)
Filter: ((finished_at IS NOT NULL) AND (finished_at < now()))

Yes, I believe that's the case - I can't see much of a hit to the performance of our database now, but its a shame that PostgresSQL does it this way