oban-bg / oban

💎 Robust job processing in Elixir, backed by modern PostgreSQL and SQLite3
https://oban.pro
Apache License 2.0
3.37k stars 313 forks source link

Extremely Slow `check_available` in Basic Engine (also used for pro smart engine) #1138

Closed maennchen closed 3 months ago

maennchen commented 3 months ago

Environment

Current Behavior

After creating a very large number of jobs (10M+), I noticed a very noticeable slow down of the system. I found the following query in the slow query logs:

SELECT DISTINCT o0."queue" FROM "public"."oban_jobs" AS o0 WHERE (o0."state" = $1) AND (NOT (o0."queue" IS NULL))

Expected Behavior

The query should not block for a long time when there's lots of jobs.

Exploration

This query is executed by Oban.Engines.Basic.check_available/1.

The query has multiple issues:

Query Explain: https://explain.dalibo.com/plan/5ea6gaed026672ed

By checking the existence of the queues present in the oban_producers table or just specifying the queues to check if they are available, reduces this down to < 100ms even with millions of jobs.

CREATE INDEX CONCURRENTLY oban_jobs_active_queue_index
    ON oban_jobs (queue) WHERE state = 'available';
CREATE INDEX CONCURRENTLY  oban_producers_queue_index
    ON oban_producers(queue);
SELECT
    oban_producers.queue
    FROM oban_producers
    WHERE EXISTS(
        SELECT
            1
            FROM oban_jobs
            WHERE
                oban_jobs.state = 'available' AND
                oban_jobs.queue = oban_producers.queue
    );

https://explain.dalibo.com/plan/da8e8a68g3gad869

The query is still far from optimal, the index scan on the jobs table is massively over estimating the amount of rows and could still be improved considerable, but is 2 orders of magnitude quicker.

sorentwo commented 3 months ago

We're ahead of you on this one 🙂

Pro v1.5 adds a custom check_available/1 implementation that is vastly faster (< 8ms for 9.6 million jobs). There's a description of the change and a performance comparison table in the CHANGELOG: https://getoban.pro/docs/pro/1.5.0-rc.0/changelog.html#enhancements

queue can never be nil since the column is required

The use of is not null was to force index usage in older versions of Oban where the compound index had queue before state. It's not necessary now, but there are Oban installations out there that still use the old index because it was an optional migration.

maennchen commented 3 months ago

@sorentwo Amazing, will update to the RC immediately then :)

maennchen commented 3 months ago

@sorentwo Works perfectly.

Before fix applied: Screenshot from 2024-08-15 15-33-06

After fix applied: Screenshot from 2024-08-15 17-16-31