Sub-optimal full scans while removing records from Postgres

davoustp commented 1 year ago

Symptom

The Postgres engine is hit heavily, showing high CPU usage (between 0.5 and 1 full CPU core).

Analysis

Connect the Postgres engine, then:

-- Enable extended display, easier to read
\x

-- Find the 10 top most queries
select
  (total_exec_time / 1000 / 3600) as total_hours,
  (total_exec_time / 1000) as total_seconds,
  mean_exec_time as avg_millis,
  calls num_calls,
  queryid,
  query
from pg_stat_statements
order by 1 desc limit 10;

shows:

-[ RECORD 1 ]-+-----------------------------------------------------
total_hours   | 444.0447958717761
total_seconds | 1598561.265138394
avg_millis    | 313.93404576810883
num_calls     | 5092029
queryid       | -7921967617282396635
query         | delete from ElsaRebusTransport \r                   +
              |                 where expiration < clock_timestamp()
-[ RECORD 2 ]-+-----------------------------------------------------
total_hours   | 68.45192336191087
total_seconds | 246426.92410287916
avg_millis    | 0.07892487714468302
num_calls     | 3122297215
queryid       | -7570363479291167647
query         | DELETE from ElsaRebusTransport \r                   +
              | where id = \r                                       +
              | (\r                                                 +
              |     select id from ElsaRebusTransport\r             +
              |     where recipient = $1\r                          +
              |     and visible < clock_timestamp()\r               +
              |     and expiration > clock_timestamp() \r           +
              |     order by priority desc, visible asc, id asc\r   +
              |     for update skip locked\r                        +
              |     limit $2\r                                      +
              | )\r                                                 +
              | returning id,\r                                     +
              | headers,\r                                          +
              | body
-[ RECORD 3 ]-+-----------------------------------------------------
total_hours   | 61.60755191337246
total_seconds | 221787.18688814086
avg_millis    | 0.07108938261809306
num_calls     | 3119835603
queryid       | -5604281935359093637
query         | DELETE from ElsaRebusTransport \r                   +
              | where id = \r                                       +
              | (\r                                                 +
              |     select id from ElsaRebusTransport\r             +
              |     where recipient = $1\r                          +
              |     and visible < clock_timestamp()\r               +
              |     and expiration > clock_timestamp() \r           +
              |     order by priority desc, visible asc, id asc\r   +
              |     for update skip locked\r                        +
              |     limit $2\r                                      +
              | )\r                                                 +
              | returning id,\r                                     +
              | headers,\r                                          +
              | body
...

Understanding why these requests are so expensive by looking at the execution plan:

explain delete from ElsaRebusTransport where expiration < clock_timestamp();

Result:

                                   QUERY PLAN                                    
---------------------------------------------------------------------------------
 Delete on elsarebustransport  (cost=0.00..15127.39 rows=52475 width=6)
   ->  Seq Scan on elsarebustransport  (cost=0.00..15127.39 rows=52475 width=6)
         Filter: (expiration < clock_timestamp())

This is a table full scan, which is obviously the root cause of the high resource consumption.

The main problem is that the query is not sargable (indexes cannot be used). Why? Because the where clause uses a non-constant value: clock_timestamp()

This is described in https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-CURRENT :

clock_timestamp ( ) → timestamp with time zone Current date and time (changes during statement execution); see Section 9.9.5 clock_timestamp() → 2019-12-23 14:39:53.662522-05

Obviously, since its value changes during statement execution, Postgres MUST invoke it at each row to check the condition... This is very inefficient, and probably performs a system call each time, which is even less efficient.

Proposed change

The obvious fix is either to compute the current date client-side, or use another function such as now() (equivalent to transaction_timestamp()), which means that this value will be constant across the entire transaction.

The execution then becomes:

explain delete from ElsaRebusTransport where expiration < now();

Output:

                                                     QUERY PLAN                                                     
--------------------------------------------------------------------------------------------------------------------
 Delete on elsarebustransport  (cost=0.42..5113.25 rows=16 width=6)
   ->  Index Scan using idx_receive_elsarebustransport on elsarebustransport (cost=0.42..5113.25 rows=16 width=6)
         Index Cond: (expiration < now())

Now it uses the index, which is better, but the cost is still high.

This can be further optimized by adding a specific index onto the expiration column :

create index if not exists idx_expiration_elsarebustransport on ElsaRebusTransport (expiration);

The execution plan is then:

                                                    QUERY PLAN                                                    
------------------------------------------------------------------------------------------------------------------
 Delete on elsarebustransport  (cost=0.43..1.55 rows=1 width=6)
   ->  Index Scan using idx_expiration_elsarebustransport on elsarebustransport  (cost=0.43..1.55 rows=1 width=6)
         Index Cond: (expiration < now())

The cost is now 1.55 compared to the initial 15127.39 - 4 orders of magnitude lower.

This is easily done in https://github.com/rebus-org/Rebus.PostgreSql/blob/master/Rebus.PostgreSql/PostgreSql/Transport/PostgresqlTransport.cs :

replace all occurrences of clock_timestamp() by now()
add the index above

jmkelly commented 1 year ago

I've been wondering if anyone uses PostgreSQL transport in rebus, so now I know! When I originally ported the transport from MSSQL years ago I'm not sure why I used clock_timestamp, so I've fixed as you suggested. Thanks for the detailed analysis, was very interesting.

sjd2021 commented 1 year ago

@jmkelly Any known reason that this same cleanup query would run ~150x per second and be rolled back every time? It executes successfully, but I guess since this method takes in an existing transaction, something down the line fails and causes even this successful command to be rolled back.

jmkelly commented 1 year ago

@sjd2021 Short answer is i don’t know a reason. In the micro benchmarks I’ve run it’s not showing up.

I do find it odd that the within the cleanup method the delete is called in a while loop that exits when no records are deleted. Given this cleanup is called on a 60s timer, I think the loop could be unnecessary.

Do you have an example project I could run up locally to test?

sjd2021 commented 1 year ago

@jmkelly I hope to have a reproducible example soon. I've only used it with Elsa thus far and have a hard time figuring out where the culprit lies.

Thanks for the quick response

mookid8000 commented 1 year ago

Should be fixed by @jmkelly in #39

It's out as Rebus.Postgresql 9.0.0-alpha04 on NuGet.org now 🙂

sjd2021 commented 1 year ago

Do you have an example project I could run up locally to test?

@jmkelly I've submitted a new issue with a repro: https://github.com/rebus-org/Rebus.PostgreSql/issues/43

rebus-org / Rebus.PostgreSql