Open richvdh opened 6 years ago
And... after some time of starting synapse (ver 1.22.1) after deleting all unreferenced state groups, I see that they are increasing again: after ~70 mins - already 41 new unreferenced state groups 😕 after ~80 mins - 56 🙄 after ~85 mins - 84 🤔 after 10 hours - 866 😧 after 12 hours - 959 😱 after 1 day - 4137 🙊 after 2 days - 8291 😫 after 3 days - 11772 😠after 6 days - 19128 🤬 after 30 days - 85765 😵
Here is rows data, maybe it helps with solving the issue:
# select * from state_groups sg
left join event_to_state_groups esg on esg.state_group=sg.id
left join state_group_edges e on e.prev_state_group=sg.id
where esg.state_group is null and e.prev_state_group is null LIMIT 50;
id | room_id | event_id | event_id | state_group | state_group | prev_state_group
---------+-------------------------------------+----------------------------------------------+----------+-------------+-------------+------------------
4327686 | !OGEhHVWSdvArJzumhm:matrix.org | $yYEinPetrv6QubrPe3WyYuw4ZryR4_2ZS0WJEJx_sgQ | | | |
4410955 | !OGEhHVWSdvArJzumhm:matrix.org | $0IjnpRYlooxiB2du3TnkRB356wN8T5aPaNCazwofLlg | | | |
4365188 | !OGEhHVWSdvArJzumhm:matrix.org | $D_sjOwboZqzybgXM1av_2SOcT3Qu9ZVKTDEi5APFsV4 | | | |
4410956 | !OGEhHVWSdvArJzumhm:matrix.org | $nhACdx2me6ucjlqCLR3OhUNnttYVlW7C_wpr3NzUges | | | |
4410957 | !OGEhHVWSdvArJzumhm:matrix.org | $pUUsHbXGn5kWxijjl-8opsIUo37doeXncDPTTTvNR2A | | | |
4758433 | !OGEhHVWSdvArJzumhm:matrix.org | $lYMoYrgV3zf_L87_TZv25Ni6tVewCdUfunFdpyTKx6U | | | |
4412687 | !OGEhHVWSdvArJzumhm:matrix.org | $X68oeCEcXDPhEv9htrPmVJm_EQgXvOQSsJtGEARo6yU | | | |
4600440 | !OGEhHVWSdvArJzumhm:matrix.org | $oUE3GQ6JlfGCSIwDzCtHgnrTSfOsBFxhkw9keVAXOV8 | | | |
4746472 | !OGEhHVWSdvArJzumhm:matrix.org | $HOQQHE9nPSgzw5j65iAhKG5887xxhlNzGzMubHue3Fg | | | |
4697006 | !OGEhHVWSdvArJzumhm:matrix.org | $f76OhzS-F4tX7ktPAHPPqHAC6eo6oQwT9HQhEFPr8yw | | | |
4746825 | !OGEhHVWSdvArJzumhm:matrix.org | $xg_uSduCltc8gxmJesBOyPFJjdxuMsyPORrubCgTU2c | | | |
4758134 | !OGEhHVWSdvArJzumhm:matrix.org | $p1icjpav000cfURGK3hTe8P68dpUVHO7WVxHhOH93SI | | | |
4758207 | !OGEhHVWSdvArJzumhm:matrix.org | $pztRa2UCtiB3ib__7P51K2yy5L6H0q6fnHtoTDs1um4 | | | |
4769079 | !OGEhHVWSdvArJzumhm:matrix.org | $iUNRQR5iuVCSMVg3WMlhJEneb846Qp_0kpS_jAFcpO8 | | | |
5080883 | !OGEhHVWSdvArJzumhm:matrix.org | $4CyTFW1WLPK8DFVKasFgr7_t-NnNaSXByU_BTkYCh5Y | | | |
5044875 | !OGEhHVWSdvArJzumhm:matrix.org | $sGelAIhSGmpjVmdhoXpN-0ZorM_Ec0UYtWYB360--dU | | | |
5044876 | !OGEhHVWSdvArJzumhm:matrix.org | $DlzCasn1WdSJRRk1l6H0Xb07aLWmk0iWyUnxluDA5Go | | | |
5086546 | !NIrfttwcKodEknQLjd:matrix.org | $16055607512067694yeChw:matrix.org | | | |
5086547 | !NIrfttwcKodEknQLjd:matrix.org | $160556193111gWDoy:ru-matrix.org | | | |
5086557 | !NIrfttwcKodEknQLjd:matrix.org | $16055607502067691GUcmD:matrix.org | | | |
5086561 | !OGEhHVWSdvArJzumhm:matrix.org | $dEa_lYqqfX8lIUl3YOrTUvywauFOvbuLISGhGOI4N1s | | | |
5086584 | !ZPbDMeSXDwjIJrdCvo:fam-ribbers.com | $F1aPxRp1R-EZNhU25UxWoKaxpx4SaP2eefPzL2R3JaQ | | | |
5086593 | !BbUcoMPQhtUNJexFlJ:matrix.org | $16055585992537808zVYXF:matrix.org | | | |
5086609 | !NIrfttwcKodEknQLjd:matrix.org | $160556215412xFFsn:ru-matrix.org | | | |
5086616 | !BvarTFnpDHTUVRxQwu:matrix.org | $16055560362515459ROhhB:matrix.org | | | |
5086638 | !uGIfZszIHlNXqoKhsk:matrix.org | $16055534762012017JnEjl:matrix.org | | | |
5086663 | !aUhETchlgthwWVQzhi:matrix.org | $16055618171350PcYjG:gnome.org | | | |
5086671 | !BbUcoMPQhtUNJexFlJ:matrix.org | $16055586012537820pjVWQ:matrix.org | | | |
5086673 | !aUhETchlgthwWVQzhi:matrix.org | $16055622111381bcvHO:gnome.org | | | |
5086675 | !NIrfttwcKodEknQLjd:matrix.org | $160556245813PvTGe:ru-matrix.org | | | |
5086676 | !aUhETchlgthwWVQzhi:matrix.org | $16055622311382JKtpN:gnome.org | | | |
5086677 | !aUhETchlgthwWVQzhi:matrix.org | $16055622341383vOTRv:gnome.org | | | |
5086678 | !OGEhHVWSdvArJzumhm:matrix.org | $2oJpJwvz_c36fbPUZgOwrOUNoBaypS5fi6Rnet1EsuU | | | |
5086679 | !aUhETchlgthwWVQzhi:matrix.org | $16055622451384HlyEA:gnome.org | | | |
5086680 | !lrZtdjyLpBmoKbMdyx:mozilla.org | $qXv75h0GvD5MtqX5-UWSYBXl8i2zRSCsYi3TF3Y-vGE | | | |
5086681 | !aUhETchlgthwWVQzhi:matrix.org | $16055622591386QbcYq:gnome.org | | | |
5086701 | !NIrfttwcKodEknQLjd:matrix.org | $160556283715UuFlm:ru-matrix.org | | | |
5086703 | !aUhETchlgthwWVQzhi:matrix.org | $16055623001387JPVkI:gnome.org | | | |
5086705 | !aUhETchlgthwWVQzhi:matrix.org | $16055623311388BczeB:gnome.org | | | |
5086707 | !aUhETchlgthwWVQzhi:matrix.org | $16055623441389QOrgx:gnome.org | | | |
5086709 | !aUhETchlgthwWVQzhi:matrix.org | $16055623771391tcQMb:gnome.org | | | |
5086719 | !NIrfttwcKodEknQLjd:matrix.org | $160556305617himTq:ru-matrix.org | | | |
5086724 | !aUhETchlgthwWVQzhi:matrix.org | $16055623941392vfKbe:gnome.org | | | |
5086731 | !aUhETchlgthwWVQzhi:matrix.org | $16055624221393BUDrY:gnome.org | | | |
5086754 | !qLGckkFpSocBZeGLei:matrix.org | $16055390301908741TYzaG:matrix.org | | | |
5086756 | !BbUcoMPQhtUNJexFlJ:matrix.org | $16055594232546922ICQYW:matrix.org | | | |
5086762 | !ZPbDMeSXDwjIJrdCvo:fam-ribbers.com | $qycTeGZNxwU4igvIPcVZ2Drj_lZTL5al3K_r1xER3cY | | | |
5086764 | !OGEhHVWSdvArJzumhm:matrix.org | $Sw_UT08x6uuLDG_g_pczQxtW-MfENsBobgIblEHb1t8 | | | |
5086765 | !lrZtdjyLpBmoKbMdyx:mozilla.org | $4ssmrAtjiw1sxJ31oxsBzfYvyDe8oU10U8CO2j4OhSw | | | |
5086766 | !aUhETchlgthwWVQzhi:matrix.org | $16055624471396NglJI:gnome.org | | | |
And grouped by room_id:
# select sg.room_id, count(sg.event_id) from state_groups sg
left join event_to_state_groups esg on esg.state_group=sg.id
left join state_group_edges e on e.prev_state_group=sg.id
where esg.state_group is null and e.prev_state_group is null group by sg.room_id order by count(sg.event_id) desc;
room_id | count
-------------------------------------+-------
!aUhETchlgthwWVQzhi:matrix.org | 876
!OGEhHVWSdvArJzumhm:matrix.org | 560
!BvarTFnpDHTUVRxQwu:matrix.org | 530
!NIrfttwcKodEknQLjd:matrix.org | 514
!ZPbDMeSXDwjIJrdCvo:fam-ribbers.com | 384
!BbUcoMPQhtUNJexFlJ:matrix.org | 308
!PiiKkGTcBDLmPnxhoT:gnuradio.org | 183
!uGIfZszIHlNXqoKhsk:matrix.org | 144
!BZXSzdvTmoFEHBAvBq:matrix.org | 109
!IFsGjAdhpzPRmreDZz:matrix.org | 76
!lrZtdjyLpBmoKbMdyx:mozilla.org | 68
!xCOiAdRoEHGPYTYtnm:matrix.org | 64
!qLGckkFpSocBZeGLei:matrix.org | 63
!zUxwGnFkUyycpxeHeM:matrix.org | 46
!gCImQEzsJzSLNhiktV:matrix.org | 31
!GibBpYxFGNraRsZOyl:matrix.org | 22
!rusSJndyAbjDmlzXIc:stratum0.org | 21
!nKJQdpqGogdsStyswc:matrix.org | 11
!AdKXuiUbMefaStPgBV:darkfasel.net | 11
!GXzvSpZpCVUoCDaGCE:ru-matrix.org | 11
!OqxbaQRNrPXglOCjpL:outcasts.win | 11
!NFBkIKQHXkwtmAyxaE:permaweb.io | 9
!YynUnYHpqlHuoTAjsp:matrix.org | 8
!GXRbborTRYLpfKpRpK:matrix.org | 8
!QtykxKocfZaZOUrTwp:matrix.org | 8
!KMJVJwvXntWbNumALm:ru-matrix.org | 7
!fJlCtKdpMHgXoEvxTO:matrix.org | 5
!joxsyRkUcrElcVOMHt:matrix.org | 5
!AAAANTUiY1fBZ230:zemos.net | 4
!CxdTjqASmMdXwTeLsR:matrix.org | 4
!pSuuPIojqelFJTESis:privacytools.io | 4
!uDQoIebqsjEEtmWLrO:disroot.org | 3
!YTvKGNlinIzlkMTVRl:matrix.org | 3
!XlPHtPxxaMXFeLKjvo:matrix.org | 3
!SEgsRQLScqPxYtucHl:archlinux.org | 3
!vsLSrLWgitKEvRImzU:matrix.org | 3
!oESsRqyaQizmKFpmJL:midov.pl | 2
!PWxnIIDhCBAbNItsSN:matrix.org | 2
!hMnHOOhQlPdWOEmCYH:ru-matrix.org | 2
!NnggCmfgXzKovaFYkN:ru-matrix.org | 2
!jrrfdpcKPllUsyVuBf:matrix.org | 2
...
I observe a behavior similar to the one described by @MurzNN, but not as fast:
Maybe the source of this problem is wrong collation of Postgres database? I see warning in Synapse logs:
synapse.storage.engines.postgres - 67 - WARNING - None - Database has incorrect collation of 'en_US.UTF-8'. Should be 'C'
and looking at #6696 - maybe this can produce duplicated rows in some tables?
@MurzNN i am sorry but i am currently deleting 127882784
unreferenced groups (am i still leading?) and i couldn't find this warning in my log...
Maybe the source of this problem is wrong collation of Postgres database? I see warning in Synapse logs:
synapse.storage.engines.postgres - 67 - WARNING - None - Database has incorrect collation of 'en_US.UTF-8'. Should be 'C'
P.S. I specifically set collation 'en_US.UTF-8' instead of 'C' for solve case-insensitive search issue https://github.com/matrix-org/synapse/issues/3116
synapse=# select count(*) from state_groups sg
synapse-# left join event_to_state_groups esg on esg.state_group=sg.id
synapse-# left join state_group_edges e on e.prev_state_group=sg.id
synapse-# where esg.state_group is null and e.prev_state_group is null;
count
--------
379015
(1 row)
Yeah, seeing this here too, though apparently not as bad as for some others
synapse=# select count(*) from state_groups sg
synapse-# left join event_to_state_groups esg on esg.state_group=sg.id
synapse-# left join state_group_edges e on e.prev_state_group=sg.id
synapse-# where esg.state_group is null and e.prev_state_group is null;
count
--------
170646
(1 row)
Also interestingly the three top-scoring rooms on my instance are notification rooms with only a single user and a local bot in them;
synapse=# select sg.room_id, count(sg.event_id) from state_groups sg
left join event_to_state_groups esg on esg.state_group=sg.id
left join state_group_edges e on e.prev_state_group=sg.id
where esg.state_group is null and e.prev_state_group is null group by sg.room_id order by count(sg.event_id) desc limit 25;
room_id | count
---------------------------------------+-------
!zxigtYLiXfDfkxIOSF:kittenface.studio | 49911
!ezgNRweyDkNswAKHcV:cadair.com | 49305
!dlZxAWNCYPIrGXeYLa:kittenface.studio | 48565
!GibBpYxFGNraRsZOyl:matrix.org | 10200
!QtykxKocfZaZOUrTwp:matrix.org | 2091
!OGEhHVWSdvArJzumhm:matrix.org | 954
!eUGMvloIjhBoAwlyRh:matrix.org | 808
!BAXLHOFjvDKUeLafmO:matrix.org | 663
!TdAwENXmXuMrCrFEFX:maunium.net | 655
!srjmqVJuXuEfzElJfN:matrix.org | 628
!joxsyRkUcrElcVOMHt:matrix.org | 507
!BfPzMJuQMaOmBzFOdD:matrix.org | 452
!mjbDjyNsRXndKLkHIe:matrix.org | 406
!qBFNwucQebGPQldAnq:matrix.org | 400
!KBzaRpFtQBjSLGqEFq:matrix.org | 361
!YTvKGNlinIzlkMTVRl:matrix.org | 302
!cAioMcJbWpHXSOPwuW:matrix.org | 286
!eUlsDjrRSHURuXlzIN:matrix.org | 274
!FPUfgzXYWTKgIrwKxW:matrix.org | 262
!EyzrwMncomOaAutnKH:matrix.org | 219
!iyIlInqJyxXrRmRHFx:matrix.org | 215
!MbRaSiMIRhhxDtJENL:maunium.net | 206
!ZPbDMeSXDwjIJrdCvo:fam-ribbers.com | 190
!BsnSQSkXzTpoPmrTZt:matrix.org | 180
!ping-v6:maunium.net | 174
(25 rows)
After some time state groups count is decreased on ru-matrix.org homeserver to 23915 from more than 85765 automatically (we are now on Synapse 1.40 version), here is top of rooms:
synapse_rmo2=# select sg.room_id, count(sg.event_id) from state_groups sg
synapse_rmo2-# left join event_to_state_groups esg on esg.state_group=sg.id
synapse_rmo2-# left join state_group_edges e on e.prev_state_group=sg.id
synapse_rmo2-# where esg.state_group is null and e.prev_state_group is null group by sg.room_id order by count(sg.event_id) desc limit 25;
room_id | count
-----------------------------------+-------
!OGEhHVWSdvArJzumhm:matrix.org | 2059
!moPMuKgsODcokCkUpq:hielle.com | 1475
!ehXvUhWNASUkSLvAGP:matrix.org | 809
!AkMxTNaaaxjawWEjiO:ru-matrix.org | 587
!YynUnYHpqlHuoTAjsp:matrix.org | 550
!YTvKGNlinIzlkMTVRl:matrix.org | 538
!rdYExDEryuZDDpEaRl:ru-matrix.org | 538
!AAAANTUiY1fBZ230:zemos.net | 452
!ping-v6:maunium.net | 444
!xYvNcQPhnkrdUmYczI:matrix.org | 389
!mjbDjyNsRXndKLkHIe:matrix.org | 372
!TdAwENXmXuMrCrFEFX:maunium.net | 359
!yUlsMHhIwEptPzkvfU:maunium.net | 347
!yomrOFwgFXzmeMAbzX:matrix.org | 342
!mTHKuEkFXCRlWgQUaY:elequin.io | 341
!rCWNvpCTZHQkiRYUDE:matrix.org | 335
!IwPxifXSjBLghpVEMh:matrix.org | 321
!DeRvyHqkqkIBbBtwsO:matrix.org | 301
!CrUOgmZrUHLkbkWvae:matrix.org | 300
!jhpZBTbckszblMYjMK:matrix.org | 298
!yjVgJUsMLYLAONsoii:jki.re | 296
!KzHjwnhdwaLkjewCCh:matrix.org | 294
!AZozoWghOYSIAfaZjJ:matrix.org | 288
!pMBteVpcoJRdCJxDmn:matrix.org | 286
!EVeQfgLsbiSsBRXEqs:ru-matrix.org | 284
(25 rows)
Is some improvements done on Synapse side for this?
cactus.chat is heavily affected by this, so it is probably caused by bots/bridges/appservices. Our homeserver is strange, because there are no users, only guests and mostly room.membership events.
synapse=# select count(*) from state_groups sg
synapse-# left join event_to_state_groups esg on esg.state_group=sg.id
synapse-# left join state_group_edges e on e.prev_state_group=sg.id
synapse-# where esg.state_group is null and e.prev_state_group is null;
count
---------
6504021
Guests count as users
I have about 1.5 million unreferenced state groups right now, is there a recommended way to deal with them?
The recommended way to remove unreferenced state groups is via https://github.com/erikjohnston/synapse-find-unreferenced-state-groups
The README for that tool still says "Do not blindly delete all the state groups that are returned by this tool" though.
The README for that tool still says "Do not blindly delete all the state groups that are returned by this tool" though.
Indeed. Shut down synapse first. Or omit the last, say, 100 results from that tool.
I didn't say it was a good solution to the problem. Just that it's a way to deal with it.
So, as long as synapse is not running during the whole cleanup process, the output of the tool can be used blindly?
So, as long as synapse is not running during the whole cleanup process, the output of the tool can be used blindly?
Yes.
I seem to have them as well. I created that table on my system as well and get the following response:
synapse=# SELECT COUNT(*) FROM state_groups_to_drop; count ------- 2272 (1 row)
Just to note: I have not run any purge commands yet
167 GB (1 row)
I cam here to brag about my database:
Name | Room ID | Current State Events | |
---|---|---|---|
My Milf Waifu . com | !twohoPqivntpjGWCJZ:matrix.org | 152786 | |
Matrix HQ | !OGEhHVWSdvArJzumhm:matrix.org | 102132 | |
Python | !iuyQXswfjgxQMZGrfQ:matrix.org | 56007 | |
dfg | !AZiuodkxUdoGQVoeUX:matrix.org | 50765 | |
Genshin Impact | 🇮🇩 | !AGeUOyHpLMMrLYAkXW:matrix.org | 42855 |
!tgbPIxPlvGiwpsGJuu:matrix.org | 38501 | ||
Raspberry Pi | !wOlkWNmgkAZFxbTaqj:matrix.org | 35246 | |
pro.cxx (Telegram's gated C/C++ chat) | !QSkCgNYFgpOXsoSLpH:russianfedora.online | 29762 | |
!SayHlEYXdrpSerhLMC:matrix.org | 29324 | ||
!YTvKGNlinIzlkMTVRl:matrix.org | 29005 | ||
Firo | !TkvmhOcvgyUClRFoCT:matrix.org | 27048 | |
Filecoin Lobby | !hKwFRvooxscFyToTfI:matrix.org | 26570 | |
GrapheneOS | !UVEsOAdphEMYhxzTah:grapheneos.org | 26466 | |
!gWiVYCURkxIapWZaGy:ipfs.io | 25534 | ||
Manjaro | !PvQGNjiCNulFZsMPud:matrix.org | 25534 | |
Edgeware Public Square | !AEQOsKuwoGVVoVwfDj:matrix.org | 25460 | |
!fzfHhoTplYBEXfWOaI:matrix.org | 25282 | ||
!ehXvUhWNASUkSLvAGP:matrix.org | 21907 | ||
NSFW_Share | !cNHqvzWMtyaMiTKVGu:matrix.org | 19493 | |
GrapheneOS Community | !lAoVmVifHHtoeOAmHO:grapheneos.org | 18941 |
Relation | Total Size |
---|---|
public.state_groups_state | 76 GB |
public.event_json | 10 GB |
public.events | 4327 MB |
public.event_edges | 2910 MB |
public.device_lists_changes_in_room | 2482 MB |
public.event_auth | 2064 MB |
public.event_search | 1778 MB |
public.room_memberships | 1171 MB |
public.event_to_state_groups | 971 MB |
public.current_state_delta_stream | 822 MB |
public.event_auth_chain_links | 761 MB |
public.state_events | 730 MB |
public.received_transactions | 674 MB |
public.event_auth_chains | 632 MB |
public.event_relations | 424 MB |
public.current_state_events | 297 MB |
public.device_lists_remote_cache | 220 MB |
public.device_inbox | 204 MB |
public.e2e_cross_signing_keys | 196 MB |
public.state_groups | 189 MB |
My state groups used to be 270 GB 4 days ago. Then I ran the synapse_auto_compressor for 3-4 days nonstop, reindex, vacuum full. But still 100 GB is enormous. I deleted the "My Milf Waifu . com " just now from the Synapse Admin panel, but am not sure it is deleted. Any way I can nuke it more directly?
I am baffled at the enormous size of the database.
select count(*) from state_groups sg left join event_to_state_groups esg on esg.state_group=sg.id left join state_group_edges e on e.prev_state_group=sg.id where esg.state_group is null and e.prev_state_group is null;
8719
I'vea read through all of the comments and there doesn't seem to be a long term solution....things will accumulate again into the database.
As an update: after the auto compressor taking 3-4 days to finish, now it takes 20-30 minutes. At least we can schedule that....but Synapse should fix this somehow because having an "optimized" database of 100GB is still insane to me. We have around 500 users. Imagine if we had 10 times more. I still don't understand what exactly is stored in that public.state_groups_state
table.....
@tio-trom Just for sidenote: Amount of users will not affect this in itself in any meaningful way, but the sizes of rooms any user on your HS participates in. Only if the additional 4500 users would figure out more rooms as big as the few biggest the current 500 users are it would multiply as you say.
Is it really working-as-intended when a single room accumulates 100+GB worth of data? Is this to be expected/accept?
@rettichschnidi I am no developer, so the exact answer I don't have for that.. Our Hacklab[.]fi synapse database takes 93 GB in it's entirety at the moment, many tens of users and crapton of bridges on top of that... So 100g for single room does sound kind of alot to me.. Though I think this is also not even related directly to the original issue...
Once again: please keep the conversation on topic. General grumbling about the size of the database is off-topic; as is anything that is improved by https://github.com/matrix-org/rust-synapse-compress-state. This issue is specifically about an accumulation of unreferenced state groups.
Could you please confirm that, as of Synapse v1.97.0
the https://github.com/erikjohnston/synapse-find-unreferenced-state-groups tool is still the recommended way to purge unreferenced state groups?
... which are filling up my disk :(
To check if you are also affected, run this query:
if you see numbers in the thousands, then it is this issue. Otherwise, you're not affected by this issue.