Closed richvdh closed 1 year ago
an EXPLAIN
on the queries in question shows that they are sequence-scanning the table to do the delete:
Delete on device_inbox (cost=8.60..2559182.13 rows=1 width=36)
-> Nested Loop (cost=8.60..2559182.13 rows=1 width=36)
Join Filter: (device_inbox.ctid = "ANY_subquery".ctid)
-> Unique (cost=8.60..8.61 rows=1 width=36)
-> Sort (cost=8.60..8.61 rows=1 width=36)
Sort Key: "ANY_subquery".ctid
-> Subquery Scan on "ANY_subquery" (cost=0.56..8.59 rows=1 width=36)
-> Limit (cost=0.56..8.58 rows=1 width=6)
-> Index Scan using device_inbox_user_stream_id on device_inbox device_inbox_1 (cost=0.56..8.58 rows=1 width=6)
Index Cond: ((user_id = '<user_id>'::text) AND (device_id = 'SAJAWKWGHO'::text) AND (stream_id <= 7535635))
-> Seq Scan on device_inbox (cost=0.00..2376344.90 rows=14626290 width=6)
Apparently the query optimiser isn't smart enough to know that it would do better searching for each of the 100 ctid
s individually than scanning the whole table.
This code changed recently in https://github.com/matrix-org/synapse/pull/16240 and again in #16312.
A couple of thoughts:
SyncHandler
at all. We know that will be deleting at most 100 rows, because they are exactly the rows that were served to the client on the previous /sync
. AFAICT there is no need to LIMIT
the delete, and therefore no need to mess about with ctid
and no need for the fallback-to-async-task.
stream_id
to delete up to for each batch, rather than messing with ctid
.Apparently the query optimiser isn't smart enough to know that it would do better searching for each of the 100
ctid
s individually than scanning the whole table.
Sadness... https://postgrespro.com/list/thread-id/2446912
For the cleanup when a device is deleted, we should probably pick a
stream_id
to delete up to for each batch, rather than messing withctid
.
something along that?
DELETE FROM device_inbox WHERE user_id = ? AND device_id = ? AND stream_id <= (
SELECT MAX(stream_id) FROM device_inbox
WHERE user_id = '@<user_id>' AND device_id = 'LTHNOYWSMJ' AND stream_id <= 7526430
LIMIT 100
)
I'm surprised it was necessary to change the code in the SyncHandler at all. We know that will be deleting at most 100 rows, because they are exactly the rows that were served to the client on the previous /sync. AFAICT there is no need to LIMIT the delete, and therefore no need to mess about with ctid and no need for the fallback-to-async-task.
:facepalm: Indeed, I just didn't think about it.
The query planner woes sounds similar to https://github.com/matrix-org/sliding-sync/commit/fa6746796c0391ea269485b3167195cc386b43a1 except in that case we wanted a seq scan.
Apparently the query optimiser isn't smart enough to know that it would do better searching for each of the 100
ctid
s individually than scanning the whole table.Sadness... https://postgrespro.com/list/thread-id/2446912
For the cleanup when a device is deleted, we should probably pick a
stream_id
to delete up to for each batch, rather than messing withctid
.something along that?
DELETE FROM device_inbox WHERE user_id = ? AND device_id = ? AND stream_id <= ( SELECT MAX(stream_id) FROM device_inbox WHERE user_id = '@<user_id>' AND device_id = 'LTHNOYWSMJ' AND stream_id <= 7526430 LIMIT 100 )
Yes. Except you need an ORDER BY stream_id
in the inner query.
And I don't think you need stream_id <= 7526430
for device deletions (we need to delete all messages for the given device, irrespective of stream_id). It's harmless, but redundant.
And I don't think you need
stream_id <= 7526430
for device deletions (we need to delete all messages for the given device, irrespective of stream_id). It's harmless, but redundant.
Someone can create a device with the same ID, and without the check it would then delete the messages aimed at the new device.
well, that would be a stupid thing to do :)
But fair enough. Stick a comment in, though - that's quite subtle!
I don't get the same plan with my deploy on pg15:
Delete on device_inbox (cost=1801.45..1962.75 rows=0 width=0)
-> Nested Loop (cost=1801.45..1962.75 rows=40 width=36)
-> HashAggregate (cost=1801.45..1801.85 rows=40 width=36)
Group Key: "ANY_subquery".ctid
-> Subquery Scan on "ANY_subquery" (cost=1641.08..1801.35 rows=40 width=36)
-> Limit (cost=1641.08..1800.95 rows=40 width=6)
-> Bitmap Heap Scan on device_inbox device_inbox_1 (cost=1641.08..1800.95 rows=40 width=6)
Recheck Cond: ((user_id = '@mxid'::text) AND (device_id = 'DEVICEID'::text))
-> Bitmap Index Scan on device_inbox_user_stream_id (cost=0.00..1641.07 rows=40 width=0)
Index Cond: ((user_id = '@mxid'::text) AND (device_id = 'DEVICEID'::text))
-> Tid Scan on device_inbox (cost=0.00..4.01 rows=1 width=6)
TID Cond: (ctid = "ANY_subquery".ctid)
(12 rows)
With #16492 , I get:
Delete on device_inbox (cost=2178.99..2231.08 rows=0 width=0)
InitPlan 1 (returns $0)
-> Aggregate (cost=1646.15..1646.16 rows=1 width=8)
-> Limit (cost=0.67..1645.65 rows=40 width=8)
-> Index Only Scan using device_inbox_user_stream_id on device_inbox device_inbox_1 (cost=0.67..1645.65 rows=40 width=8)
Index Cond: ((user_id = '@mat:tout.im'::text) AND (device_id = 'CCUJZYMJLP'::text))
-> Bitmap Heap Scan on device_inbox (cost=532.83..584.92 rows=13 width=6)
Recheck Cond: ((user_id = '@mat:tout.im'::text) AND (device_id = 'CCUJZYMJLP'::text) AND (stream_id <= $0))
-> Bitmap Index Scan on device_inbox_user_stream_id (cost=0.00..532.83 rows=13 width=0)
Index Cond: ((user_id = '@mat:tout.im'::text) AND (device_id = 'CCUJZYMJLP'::text) AND (stream_id <= $0))
(10 rows)
So it's probably a tad slower on more modern PG, but at least we will (?) not get a seqscan on older ones.
@richvdh could you do a comparison with your test DB to check the new query is ok on PG11?
Also note that my new query delete 10 rows and not 12, I need to check why.
The indicated rows is the number of lines for the plan and not the number of deleted records :sob: .
I have executed the 2 inner SELECT
and they do return the same number of rows for my test case (125 rows).
I think this was fixed by #16491?
To-device message deletions happen in two places: 1) after they are successfully sent to the target device via /sync
; 2) when the target device is deleted. Both were using delete_messages_for_device_txn
which uses a very inefficient query plan on Postgres 11.
/sync
case to no longer use delete_messages_for_device_txn
and therefore fixes case 1, but does not change the query plan in that function so does not fix case 2.oh, in fairness, the subject of this issue is just about the /sync
requests, so yes, I guess it is fixed. Sorry.
For completeness, the device-deletion side looks to be covered in #16492.
Description
Synchrotron workers are maxing out db connections, resulting in multi-minute scheduling time:
Inspection shows that these are operations of the shape
Some queries have
LIMIT 1000
rather thanLIMIT 100
.Steps to reproduce
device_inbox
/sync
Homeserver
EMS host
Synapse Version
1.94.0
Installation Method
Docker (matrixdotorg/synapse)
Database
postgres 11
Workers
Multiple workers
Platform
k8s
Configuration
No response
Relevant log output
Anything else that would be useful to know?
No response