Closed florinpatrascu closed 4 years ago
You have an external disk merge sort that is taking 2.5gb. I'm guessing that is slow. I'm not sure why it is doing that, however. Can you share your current jobs table structure (specifically the indexes)?
According to this, which is a little old, our use of due_at NULLS FIRST
may be going in the opposite direction from the way the index is built (nulls last). https://www.postgresql.org/docs/9.1/indexes-ordering.html
We might need to change the index. Not sure that will fix your particular problem though.
The table structures and the associated index are the ones created by the migration. We’re using the newest Rihanna version.
But you are confirming our findings. My team found that part of the query being the possible culprit. We created a new index:
CREATE INDEX rihanna_jobs_locking_index ON rihanna_jobs (priority, due_at ASC NULLS FIRST, enqueued_at, id);
and our query duration, for the same data set, now dropped bellow a second!
We’re monitoring our system currently, and I’ll return with a definitive confirmation that this is indeed the fix, but so far it appears to be so, at least for our case :)
Good to hear. Would love to see the new EXPLAIN after that change, if you have it available.
sure thing, here it is: https://explain.depesz.com/s/FPZJ
Mind you, in order to verify this solution, we're using a test table with more than 3 times the number of jobs than used originally, when we first discovered the issue; 3,409,126 records (jobs), respectively.
I believe it can be optimized even further?!
added the second plan as an optimization to the original - https://explain.depesz.com/s/s8vc, for brevity.
That index scan looks much better.
indeed
Good find. Sounds like the index is the wrong way round indeed. Anybody care to open a PR?
I’m on it. What would you like to do for folks already on this version? Just a doc with the usual bits about using concurrent to avoid downtime?
@tpitale Just a doc is fine.
Hi there - we have a table with more than a million jobs, and the
WITH RECURSIVE jobs AS...
query used by theRihanna.Job.lock/3
is timing out right away. This query takes more than 45 seconds to return, if we relax the dbconnection timeout. Anyone here encountered this situation, and if yes, how are you handling it? Can that query be optimized? Many thanks!And here is the psql plan, in case it will be useful for troubleshooting: https://explain.depesz.com/s/s8vc