matrix-org / synapse

Synapse: Matrix homeserver written in Python/Twisted.
https://matrix-org.github.io/synapse
Apache License 2.0
11.82k stars 2.13k forks source link

Poor performance because of two large rooms with spam bots #12730

Closed lnicola closed 2 years ago

lnicola commented 2 years ago

Description

After upgrading my HS, everything got very slow. Loading or uploading images and opening the room list take a couple of minutes. Every so often, Element complains it can't reach the server.

image

Logs don't tell much:

2022-05-13 15:08:44,007 - synapse.http.matrixfederationclient - 285 - INFO - federation_transaction_transmission_loop-58591 - {PUT-O-49291} [safaradeg.net] Completed request: 200 OK in 0.96 secs, got 11 bytes - PUT matrix://safaradeg.net/_matrix/federation/v1/send/1652436499039
2022-05-13 15:08:44,008 - synapse.federation.sender.transaction_manager - 170 - INFO - federation_transaction_transmission_loop-58591 - TX [safaradeg.net] {1652436499039} got 200 response
2022-05-13 15:08:44,020 - synapse.http.matrixfederationclient - 285 - INFO - federation_transaction_transmission_loop-58590 - {PUT-O-49290} [chenmis.ems.host] Completed request: 200 OK in 1.05 secs, got 11 bytes - PUT matrix://chenmis.ems.host/_matrix/federation/v1/send/1652436499038
2022-05-13 15:08:44,020 - synapse.federation.sender.transaction_manager - 170 - INFO - federation_transaction_transmission_loop-58590 - TX [chenmis.ems.host] {1652436499038} got 200 response
2022-05-13 15:08:44,024 - synapse.access.http.8008 - 427 - INFO - PUT-12194 - 2a01:4f9:c011:1fa9::1 - 8008 - {moonstonemint.com} Processed request: 1.793sec/0.186sec (0.010sec, 0.003sec) (0.141sec/0.197sec/7) 11B 200 "PUT /_matrix/federation/v1/send/1652437932548 HTTP/1.0" "Synapse/1.58.1" [0 dbevts]
2022-05-13 15:08:44,024 - synapse.access.http.8008 - 427 - INFO - GET-12210 - 134.0.29.111 - 8008 - {kashofer.org} Processed request: 0.275sec/0.181sec (0.004sec, 0.000sec) (0.000sec/0.000sec/0) 983B 200 "GET /_matrix/federation/v1/event/%24Xt5xTCYypmLW6AQ-dq_GyxyK4qcMYBaY6XvKvxy0cS4 HTTP/1.0" "Synapse/1.56.0" [1 dbevts]
2022-05-13 15:08:44,025 - synapse.access.http.8008 - 427 - INFO - GET-12211 - 134.0.29.111 - 8008 - {kashofer.org} Processed request: 0.278sec/0.178sec (0.001sec, 0.000sec) (0.000sec/0.000sec/0) 923B 200 "GET /_matrix/federation/v1/event/%24N3Lt5mZnsNtPQiXJLVVM37OgDycuts3qcb2YLqUQ-XI HTTP/1.0" "Synapse/1.56.0" [1 dbevts]
2022-05-13 15:08:44,027 - synapse.access.http.8008 - 427 - INFO - GET-12209 - 134.0.29.111 - 8008 - {kashofer.org} Processed request: 0.277sec/0.181sec (0.002sec, 0.000sec) (0.000sec/0.000sec/0) 838B 200 "GET /_matrix/federation/v1/event/%24KOiT4__pjS5YjQDLC7-Fw58uWCuYMMw6u6Doi_Jo9ck HTTP/1.0" "Synapse/1.56.0" [1 dbevts]
2022-05-13 15:08:44,122 - synapse.http.matrixfederationclient - 285 - INFO - federation_transaction_transmission_loop-58592 - {PUT-O-49292} [vac.systemec.nl] Completed request: 200 OK in 0.93 secs, got 11 bytes - PUT matrix://vac.systemec.nl/_matrix/federation/v1/send/1652436499040
2022-05-13 15:08:44,122 - synapse.federation.sender.transaction_manager - 170 - INFO - federation_transaction_transmission_loop-58592 - TX [vac.systemec.nl] {1652436499040} got 200 response
2022-05-13 15:08:44,414 - synapse.federation.sender.transaction_manager - 120 - INFO - federation_transaction_transmission_loop-58601 - TX [beisler.com] {1652436499043} Sending transaction [1652436499043], (PDUs: 0, EDUs: 1)
2022-05-13 15:08:44,573 - synapse.util.ratelimitutils - 132 - INFO - GET-12218 - Ratelimiter: queueing request (queue now 1 items)
2022-05-13 15:08:44,574 - synapse.util.ratelimitutils - 132 - INFO - GET-12219 - Ratelimiter: queueing request (queue now 2 items)
2022-05-13 15:08:44,579 - synapse.access.http.8008 - 427 - INFO - PUT-12207 - 2604:4300:a:136::2 - 8008 - {randomsonicnet.org} Processed request: 0.912sec/0.201sec (0.006sec, 0.000sec) (0.424sec/0.067sec/5) 11B 200 "PUT /_matrix/federation/v1/send/1651597922922 HTTP/1.0" "Synapse/1.58.0" [0 dbevts]
2022-05-13 15:08:44,588 - synapse.http.server - 95 - INFO - GET-12216 - <XForwardedForRequest at 0x7f25e97358e0 method='GET' uri='/_matrix/federation/v1/state_ids/%21vRGLvqJYlFvzpThbxI%3Amatrix%2Eorg?event_id=%24InlhuE_OklBLdiflwD3bGkcTivcsj4gG1Hz4fGeNIUk' clientproto='HTTP/1.0' site='8008'> SynapseError: 404 - State not known at event $InlhuE_OklBLdiflwD3bGkcTivcsj4gG1Hz4fGeNIUk
2022-05-13 15:08:44,597 - synapse.federation.sender.transaction_manager - 120 - INFO - federation_transaction_transmission_loop-58602 - TX [krantz.one] {1652436499044} Sending transaction [1652436499044], (PDUs: 0, EDUs: 1)
2022-05-13 15:08:44,603 - synapse.http.federation.well_known_resolver - 253 - INFO - federation_transaction_transmission_loop-58602 - Fetching https://krantz.one/.well-known/matrix/server
2022-05-13 15:08:44,699 - synapse.http.matrixfederationclient - 285 - INFO - federation_transaction_transmission_loop-58596 - {PUT-O-49294} [hpdeifel.de] Completed request: 200 OK in 1.01 secs, got 11 bytes - PUT matrix://hpdeifel.de/_matrix/federation/v1/send/1652436499042
2022-05-13 15:08:44,699 - synapse.federation.sender.transaction_manager - 170 - INFO - federation_transaction_transmission_loop-58596 - TX [hpdeifel.de] {1652436499042} got 200 response
2022-05-13 15:08:44,701 - synapse.access.http.8008 - 427 - INFO - GET-12213 - 134.0.29.111 - 8008 - {kashofer.org} Processed request: 0.456sec/0.120sec (0.004sec, 0.000sec) (0.000sec/0.000sec/0) 983B 200 "GET /_matrix/federation/v1/event/%24-zZtUPWzoVZV2nnlkWqAyizCw2FnNtTw1PU5DLCoxUI HTTP/1.0" "Synapse/1.56.0" [1 dbevts]
2022-05-13 15:08:44,702 - synapse.access.http.8008 - 427 - INFO - GET-12214 - 86.120.82.123 - 8008 - {@grayshade:dend.ro} Processed request: 0.361sec/0.116sec (0.003sec, 0.000sec) (0.138sec/0.051sec/2) 658B 200 "GET /_matrix/client/r0/sync?filter=0&timeout=30000&since=s517124_668919_59_109301_39335_11_2822_32192_1 HTTP/1.0" "Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0" [0 dbevts]
2022-05-13 15:08:44,702 - synapse.access.http.8008 - 427 - INFO - PUT-12208 - 2001:1ad0:c4fe:a::142 - 8008 - {0wnz.at} Processed request: 1.018sec/0.116sec (0.007sec, 0.001sec) (0.481sec/0.072sec/5) 11B 200 "PUT /_matrix/federation/v1/send/1652300301104 HTTP/1.0" "Synapse/1.58.1" [0 dbevts]
2022-05-13 15:08:44,702 - synapse.access.http.8008 - 427 - INFO - GET-12217 - 134.0.29.111 - 8008 - {kashofer.org} Processed request: 0.213sec/0.115sec (0.002sec, 0.000sec) (0.000sec/0.000sec/0) 872B 200 "GET /_matrix/federation/v1/event/%24HvsHJygo2W8U7NdfQFr0fHuRA27v_4C3A_EKcSxoX-Y HTTP/1.0" "Synapse/1.56.0" [1 dbevts]
2022-05-13 15:08:44,704 - synapse.access.http.8008 - 427 - INFO - GET-12216 - 192.9.142.107 - 8008 - {cjearls2.ddns.net} Processed request: 0.360sec/0.116sec (0.001sec, 0.000sec) (0.000sec/0.000sec/0) 105B 404 "GET /_matrix/federation/v1/state_ids/%21vRGLvqJYlFvzpThbxI%3Amatrix%2Eorg?event_id=%24InlhuE_OklBLdiflwD3bGkcTivcsj4gG1Hz4fGeNIUk HTTP/1.0" "-" [1 dbevts]
2022-05-13 15:08:44,705 - synapse.access.http.8008 - 427 - INFO - GET-12215 - 134.0.29.111 - 8008 - {kashofer.org} Processed request: 0.370sec/0.108sec (0.004sec, 0.002sec) (0.000sec/0.000sec/0) 929B 200 "GET /_matrix/federation/v1/event/%241lwQKfZS9WQPwcnkEaptKf9WjsWvFVW7u3F4F1vBoMk HTTP/1.0" "Synapse/1.56.0" [1 dbevts]
2022-05-13 15:08:44,708 - synapse.federation.sender.transaction_manager - 120 - INFO - federation_transaction_transmission_loop-58603 - TX [hmlw.me] {1652436499045} Sending transaction [1652436499045], (PDUs: 0, EDUs: 1)
2022-05-13 15:08:44,873 - synapse.federation.transport.server.federation - 102 - INFO - PUT-12220 - Received txn 1651267025281 from feline.support. (PDUs: 0, EDUs: 1)
2022-05-13 15:08:44,970 - synapse.http.matrixfederationclient - 285 - INFO - federation_transaction_transmission_loop-58594 - {PUT-O-49293} [dfri.se] Completed request: 200 OK in 1.50 secs, got 11 bytes - PUT matrix://dfri.se/_matrix/federation/v1/send/1652436499041
2022-05-13 15:08:44,970 - synapse.federation.sender.transaction_manager - 170 - INFO - federation_transaction_transmission_loop-58594 - TX [dfri.se] {1652436499041} got 200 response
2022-05-13 15:08:45,250 - synapse.access.http.8008 - 427 - INFO - GET-12219 - 134.0.29.111 - 8008 - {kashofer.org} Processed request: 0.331sec/0.543sec (0.004sec, 0.000sec) (0.000sec/0.000sec/0) 844B 200 "GET /_matrix/federation/v1/event/%24U2aCbAY5sYFp1w38HCaI8hLKxVNYdoT9z_d4QxKBdQQ HTTP/1.0" "Synapse/1.56.0" [1 dbevts]
2022-05-13 15:08:45,278 - synapse.access.http.8008 - 427 - INFO - GET-12218 - 134.0.29.111 - 8008 - {kashofer.org} Processed request: 0.329sec/0.574sec (0.004sec, 0.000sec) (0.000sec/0.000sec/0) 815B 200 "GET /_matrix/federation/v1/event/%24_C1bXeGmXhJHVU5oebBHwZjfkKqmpDsNV0hjtkNvGo8 HTTP/1.0" "Synapse/1.56.0" [1 dbevts]
2022-05-13 15:08:45,554 - synapse.federation.sender.transaction_manager - 120 - INFO - federation_transaction_transmission_loop-58605 - TX [matrix.jschwietzke.de] {1652436499046} Sending transaction [1652436499046], (PDUs: 0, EDUs: 1)
2022-05-13 15:08:45,635 - synapse.handlers.presence - 788 - INFO - persist_presence_changes-108 - Persisting 22 unpersisted presence updates
2022-05-13 15:08:46,052 - synapse.util.ratelimitutils - 132 - INFO - GET-12232 - Ratelimiter: queueing request (queue now 1 items)
2022-05-13 15:08:46,885 - synapse.access.http.8008 - 427 - INFO - PUT-12220 - 185.86.106.217 - 8008 - {feline.support} Processed request: 1.341sec/0.965sec (0.004sec, 0.002sec) (0.030sec/0.068sec/5) 11B 200 "PUT /_matrix/federation/v1/send/1651267025281 HTTP/1.0" "Synapse/1.57.1" [0 dbevts]
2022-05-13 15:08:46,885 - synapse.access.http.8008 - 427 - INFO - GET-12221 - 134.0.29.111 - 8008 - {kashofer.org} Processed request: 0.285sec/0.957sec (0.006sec, 0.000sec) (0.000sec/0.000sec/0) 871B 200 "GET /_matrix/federation/v1/event/%24M0gbRVYXpoFsv0hn8ThdE2AtuX6i0bpqgLpx4CUW5jA HTTP/1.0" "Synapse/1.56.0" [1 dbevts]
2022-05-13 15:08:46,886 - synapse.access.http.8008 - 427 - INFO - GET-12222 - 83.25.82.45 - 8008 - {matrix.eviljab.me} Processed request: 0.259sec/0.866sec (0.002sec, 0.000sec) (0.000sec/0.000sec/0) 801B 200 "GET /_matrix/federation/v1/event/%24w_oQzOw5Mt3FGtBhC6QcwT4oNM1OuLKSoVpSwXFL3go HTTP/1.0" "Synapse/1.58.1" [0 dbevts]
2022-05-13 15:08:46,886 - synapse.access.http.8008 - 427 - INFO - GET-12223 - 83.25.82.45 - 8008 - {matrix.eviljab.me} Processed request: 0.262sec/0.862sec (0.001sec, 0.000sec) (0.000sec/0.000sec/0) 801B 200 "GET /_matrix/federation/v1/event/%24eqe7R9mEgVSV4jMxBb2xyLOqg09iolpcb1TkiN-shx0 HTTP/1.0" "Synapse/1.58.1" [0 dbevts]
2022-05-13 15:08:46,887 - synapse.access.http.8008 - 427 - INFO - GET-12224 - 83.25.82.45 - 8008 - {matrix.eviljab.me} Processed request: 0.263sec/0.861sec (0.001sec, 0.000sec) (0.000sec/0.000sec/0) 780B 200 "GET /_matrix/federation/v1/event/%2487sGdSy3fzg8XiSoOExnebNXQ4BOCe8jo3LiJPprSZc HTTP/1.0" "Synapse/1.58.1" [0 dbevts]
2022-05-13 15:08:46,887 - synapse.access.http.8008 - 427 - INFO - GET-12225 - 83.25.82.45 - 8008 - {matrix.eviljab.me} Processed request: 0.270sec/0.854sec (0.004sec, 0.000sec) (0.000sec/0.000sec/0) 1242B 200 "GET /_matrix/federation/v1/event/%24lHQ4fhBsVq3IXLZtimajMV0VJPgM7JdnsGZXAbews8w HTTP/1.0" "Synapse/1.58.1" [0 dbevts]
2022-05-13 15:08:46,891 - synapse.access.http.8008 - 427 - INFO - GET-12226 - 83.25.82.45 - 8008 - {matrix.eviljab.me} Processed request: 0.276sec/0.851sec (0.002sec, 0.000sec) (0.000sec/0.000sec/0) 801B 200 "GET /_matrix/federation/v1/event/%24ls5KQlCUNu8GfszQkpRf2yJCMpG2ViLfNPEbLr3xDH8 HTTP/1.0" "Synapse/1.58.1" [0 dbevts]
2022-05-13 15:08:46,892 - synapse.http.server - 95 - INFO - GET-12230 - <XForwardedForRequest at 0x7f2597ca1640 method='GET' uri='/_matrix/federation/v1/state_ids/%21vRGLvqJYlFvzpThbxI%3Amatrix%2Eorg?event_id=%24GMT0w4XKvS94hY0DaG7j8-KeqLmbkuxrry_1riiUPCU' clientproto='HTTP/1.0' site='8008'> SynapseError: 404 - State not known at event $GMT0w4XKvS94hY0DaG7j8-KeqLmbkuxrry_1riiUPCU
2022-05-13 15:08:46,901 - synapse.federation.sender.transaction_manager - 120 - INFO - federation_transaction_transmission_loop-58609 - TX [strugee.net] {1652436499047} Sending transaction [1652436499047], (PDUs: 0, EDUs: 1)
2022-05-13 15:08:46,902 - synapse.http.federation.well_known_resolver - 253 - INFO - federation_transaction_transmission_loop-58609 - Fetching https://strugee.net/.well-known/matrix/server
2022-05-13 15:08:47,003 - synapse.http.matrixfederationclient - 285 - INFO - federation_transaction_transmission_loop-58601 - {PUT-O-49295} [beisler.com] Completed request: 200 OK in 2.59 secs, got 11 bytes - PUT matrix://beisler.com/_matrix/federation/v1/send/1652436499043
2022-05-13 15:08:47,003 - synapse.federation.sender.transaction_manager - 170 - INFO - federation_transaction_transmission_loop-58601 - TX [beisler.com] {1652436499043} got 200 response
2022-05-13 15:08:47,004 - synapse.http.matrixfederationclient - 285 - INFO - federation_transaction_transmission_loop-58603 - {PUT-O-49297} [hmlw.me] Completed request: 200 OK in 2.30 secs, got 11 bytes - PUT matrix://hmlw.me/_matrix/federation/v1/send/1652436499045
2022-05-13 15:08:47,004 - synapse.federation.sender.transaction_manager - 170 - INFO - federation_transaction_transmission_loop-58603 - TX [hmlw.me] {1652436499045} got 200 response
2022-05-13 15:08:47,570 - synapse.access.http.8008 - 427 - INFO - GET-12230 - 192.9.142.107 - 8008 - {cjearls2.ddns.net} Processed request: 1.126sec/0.678sec (0.004sec, 0.000sec) (0.000sec/0.000sec/0) 105B 404 "GET /_matrix/federation/v1/state_ids/%21vRGLvqJYlFvzpThbxI%3Amatrix%2Eorg?event_id=%24GMT0w4XKvS94hY0DaG7j8-KeqLmbkuxrry_1riiUPCU HTTP/1.0" "-" [1 dbevts]
2022-05-13 15:08:47,571 - synapse.access.http.8008 - 427 - INFO - GET-12227 - 134.0.29.111 - 8008 - {kashofer.org} Processed request: 1.132sec/0.675sec (0.003sec, 0.000sec) (0.000sec/0.000sec/0) 903B 200 "GET /_matrix/federation/v1/event/%244RVYUq0xcZPPjXsju9oFEarfLtP0TqzeoCumQ3i5m9w HTTP/1.0" "Synapse/1.56.0" [1 dbevts]
2022-05-13 15:08:47,605 - synapse.access.http.8008 - 427 - INFO - GET-12231 - 134.0.29.111 - 8008 - {kashofer.org} Processed request: 1.128sec/0.711sec (0.003sec, 0.000sec) (0.000sec/0.000sec/0) 978B 200 "GET /_matrix/federation/v1/event/%24FShUuqicBNlgfUGGJTR4LhmudtotQ72YDXeiuORPr0k HTTP/1.0" "Synapse/1.56.0" [1 dbevts]
2022-05-13 15:08:47,606 - synapse.access.http.8008 - 427 - INFO - GET-12228 - 134.0.29.111 - 8008 - {kashofer.org} Processed request: 1.135sec/0.706sec (0.002sec, 0.000sec) (0.000sec/0.000sec/0) 798B 200 "GET /_matrix/federation/v1/event/%2465P25zy348ezHOhZb7oM4F7-usX-SZ7MgZLqKrOktZg HTTP/1.0" "Synapse/1.56.0" [1 dbevts]
2022-05-13 15:08:47,609 - synapse.federation.sender.transaction_manager - 120 - INFO - federation_transaction_transmission_loop-58610 - TX [matrix.getdata.com] {1652436499048} Sending transaction [1652436499048], (PDUs: 0, EDUs: 1)
2022-05-13 15:08:47,609 - synapse.http.federation.well_known_resolver - 253 - INFO - federation_transaction_transmission_loop-58610 - Fetching https://matrix.getdata.com/.well-known/matrix/server
2022-05-13 15:08:47,691 - synapse.http.matrixfederationclient - 285 - INFO - federation_transaction_transmission_loop-58605 - {PUT-O-49298} [matrix.jschwietzke.de] Completed request: 200 OK in 2.14 secs, got 11 bytes - PUT matrix://matrix.jschwietzke.de/_matrix/federation/v1/send/1652436499046
2022-05-13 15:08:47,691 - synapse.federation.sender.transaction_manager - 170 - INFO - federation_transaction_transmission_loop-58605 - TX [matrix.jschwietzke.de] {1652436499046} got 200 response
2022-05-13 15:08:47,694 - synapse.http.federation.well_known_resolver - 284 - INFO - federation_transaction_transmission_loop-58602 - Error fetching https://krantz.one/.well-known/matrix/server: [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
2022-05-13 15:08:47,766 - synapse.federation.transport.server.federation - 102 - INFO - PUT-12233 - Received txn 1649290065398 from awful.club. (PDUs: 0, EDUs: 1)
2022-05-13 15:08:47,881 - synapse.access.http.8008 - 427 - INFO - GET-12234 - 86.120.82.123 - 8008 - {@grayshade:dend.ro} Processed request: 0.773sec/0.103sec (0.004sec, 0.000sec) (0.011sec/0.041sec/2) 3305B 200 "GET /_matrix/client/r0/sync?filter=0&timeout=30000&since=s517124_668922_59_109301_39335_11_2822_32192_1 HTTP/1.0" "Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0" [0 dbevts]
2022-05-13 15:08:47,925 - synapse.federation.transport.server.federation - 102 - INFO - PUT-12236 - Received txn 1649328244171 from subspace.im. (PDUs: 0, EDUs: 1)
2022-05-13 15:08:47,943 - synapse.access.http.8008 - 427 - INFO - PUT-12233 - 142.93.86.51 - 8008 - {awful.club} Processed request: 0.879sec/0.060sec (0.001sec, 0.001sec) (0.031sec/0.008sec/2) 11B 200 "PUT /_matrix/federation/v1/send/1649290065398 HTTP/1.0" "Synapse/1.50.2" [0 dbevts]
2022-05-13 15:08:48,020 - synapse.access.http.8008 - 427 - INFO - GET-12235 - 134.0.29.111 - 8008 - {kashofer.org} Processed request: 0.174sec/0.074sec (0.000sec, 0.000sec) (0.000sec/0.000sec/0) 828B 200 "GET /_matrix/federation/v1/event/%24oBWBWoWMKXAvepIXwKzEAJKGE48U1tbn0zdmfxg1hgg HTTP/1.0" "Synapse/1.56.0" [1 dbevts]
2022-05-13 15:08:48,207 - synapse.util.ratelimitutils - 132 - INFO - GET-12239 - Ratelimiter: queueing request (queue now 1 items)
2022-05-13 15:08:48,235 - synapse.access.http.8008 - 427 - INFO - PUT-12236 - 2607:5300:60:9826:140::1 - 8008 - {subspace.im} Processed request: 0.248sec/0.214sec (0.002sec, 0.001sec) (0.037sec/0.018sec/2) 11B 200 "PUT /_matrix/federation/v1/send/1649328244171 HTTP/1.0" "Synapse/1.53.0" [0 dbevts]
2022-05-13 15:08:48,384 - synapse.access.http.8008 - 427 - INFO - GET-12237 - 134.0.29.111 - 8008 - {kashofer.org} Processed request: 0.383sec/0.146sec (0.002sec, 0.000sec) (0.000sec/0.000sec/0) 861B 200 "GET /_matrix/federation/v1/event/%24FksStydgmfk7fpmHwusALURxQaCvjObqfPzLaJs7SAk HTTP/1.0" "Synapse/1.56.0" [1 dbevts]
2022-05-13 15:08:48,384 - synapse.access.http.8008 - 427 - INFO - GET-12238 - 134.0.29.111 - 8008 - {kashofer.org} Processed request: 0.297sec/0.145sec (0.002sec, 0.000sec) (0.000sec/0.000sec/0) 861B 200 "GET /_matrix/federation/v1/event/%24uf7JBmldVjqNC-0ko7CrIIsFrb4cDGukchTUS-n_vTg HTTP/1.0" "Synapse/1.56.0" [1 dbevts]
2022-05-13 15:08:48,490 - synapse.access.http.8008 - 427 - INFO - GET-12239 - 134.0.29.111 - 8008 - {kashofer.org} Processed request: 0.381sec/0.090sec (0.005sec, 0.000sec) (0.000sec/0.000sec/0) 983B 200 "GET /_matrix/federation/v1/event/%243koV0sZ1utaYwmLm5Mx_uv4s6aGordrlKTBwBoQOjwU HTTP/1.0" "Synapse/1.56.0" [1 dbevts]
2022-05-13 15:08:48,604 - synapse.federation.sender.transaction_manager - 120 - INFO - federation_transaction_transmission_loop-58619 - TX [monero.social] {1652436499049} Sending transaction [1652436499049], (PDUs: 0, EDUs: 1)
2022-05-13 15:08:48,678 - synapse.federation.sender.transaction_manager - 120 - INFO - federation_transaction_transmission_loop-58620 - TX [chat.gadgetusaf.com] {1652436499050} Sending transaction [1652436499050], (PDUs: 0, EDUs: 1)
2022-05-13 15:08:48,679 - synapse.http.federation.well_known_resolver - 253 - INFO - federation_transaction_transmission_loop-58620 - Fetching https://chat.gadgetusaf.com/.well-known/matrix/server
2022-05-13 15:08:48,775 - synapse.access.http.8008 - 427 - INFO - GET-12241 - 134.0.29.111 - 8008 - {kashofer.org} Processed request: 0.201sec/0.097sec (0.001sec, 0.000sec) (0.000sec/0.000sec/0) 975B 200 "GET /_matrix/federation/v1/event/%24StUc6T_BVWyfLl4e5_iUP09YNtfZ56QlYiCj_Ar2Myc HTTP/1.0" "Synapse/1.56.0" [1 dbevts]
2022-05-13 15:08:48,939 - synapse.http.matrixfederationclient - 285 - INFO - federation_transaction_transmission_loop-58602 - {PUT-O-49296} [krantz.one] Completed request: 200 OK in 4.34 secs, got 11 bytes - PUT matrix://krantz.one/_matrix/federation/v1/send/1652436499044
2022-05-13 15:08:48,939 - synapse.federation.sender.transaction_manager - 170 - INFO - federation_transaction_transmission_loop-58602 - TX [krantz.one] {1652436499044} got 200 response
2022-05-13 15:08:48,939 - synapse.access.http.8008 - 427 - INFO - GET-12242 - 134.0.29.111 - 8008 - {kashofer.org} Processed request: 0.106sec/0.161sec (0.002sec, 0.000sec) (0.000sec/0.000sec/0) 812B 200 "GET /_matrix/federation/v1/event/%24MwiMxyjd_J9dJK-p8z2pwaPSCNf9j9tFAE1GsIj2hyE HTTP/1.0" "Synapse/1.56.0" [1 dbevts]
2022-05-13 15:08:48,945 - synapse.federation.sender.transaction_manager - 120 - INFO - federation_transaction_transmission_loop-58622 - TX [ungleich.ch] {1652436499051} Sending transaction [1652436499051], (PDUs: 0, EDUs: 1)
2022-05-13 15:08:49,019 - synapse.access.http.8008 - 427 - INFO - GET-12243 - 134.0.29.111 - 8008 - {kashofer.org} Processed request: 0.178sec/0.074sec (0.003sec, 0.000sec) (0.000sec/0.000sec/0) 815B 200 "GET /_matrix/federation/v1/event/%24SOJwzMg6uoXaxeT9oqnqhcBBzZqrTzEiFh2jDzx0aA8 HTTP/1.0" "Synapse/1.56.0" [1 dbevts]
2022-05-13 15:08:49,035 - synapse.federation.sender.transaction_manager - 120 - INFO - federation_transaction_transmission_loop-58623 - TX [foxo.me] {1652436499052} Sending transaction [1652436499052], (PDUs: 0, EDUs: 1)
2022-05-13 15:08:49,116 - synapse.access.http.8008 - 427 - INFO - GET-12244 - 134.0.29.111 - 8008 - {kashofer.org} Processed request: 0.255sec/0.094sec (0.003sec, 0.000sec) (0.000sec/0.000sec/0) 859B 200 "GET /_matrix/federation/v1/event/%24hZ6IV7fVO8IQou2eJ43-tDHas6DINmOAYUlW3Bf369I HTTP/1.0" "Synapse/1.56.0" [1 dbevts]
2022-05-13 15:08:49,142 - synapse.federation.sender.transaction_manager - 120 - INFO - federation_transaction_transmission_loop-58624 - TX [sosnowkadub.de] {1652436499053} Sending transaction [1652436499053], (PDUs: 0, EDUs: 1)
2022-05-13 15:08:49,266 - synapse.federation.sender.transaction_manager - 120 - INFO - federation_transaction_transmission_loop-58625 - TX [kazmen.com] {1652436499054} Sending transaction [1652436499054], (PDUs: 0, EDUs: 1)
2022-05-13 15:08:49,272 - synapse.metrics._gc - 120 - INFO - sentinel - Collecting gc 1
2022-05-13 15:08:49,567 - synapse.http.matrixfederationclient - 285 - INFO - federation_transaction_transmission_loop-58619 - {PUT-O-49301} [monero.social] Completed request: 200 OK in 0.96 secs, got 11 bytes - PUT matrix://monero.social/_matrix/federation/v1/send/1652436499049
2022-05-13 15:08:49,567 - synapse.federation.sender.transaction_manager - 170 - INFO - federation_transaction_transmission_loop-58619 - TX [monero.social] {1652436499049} got 200 response
2022-05-13 15:08:49,578 - synapse.http.federation.well_known_resolver - 197 - INFO - federation_transaction_transmission_loop-58610 - Error parsing well-known for b'matrix.getdata.com': Non-200 response 404
2022-05-13 15:08:49,602 - synapse.access.http.8008 - 427 - INFO - GET-12247 - 83.25.82.45 - 8008 - {matrix.eviljab.me} Processed request: 0.463sec/0.039sec (0.003sec, 0.000sec) (0.000sec/0.000sec/0) 801B 200 "GET /_matrix/federation/v1/event/%24pHgg3WSa2sEkUV7SRdm0Pq5Q0zc6zgCw4KkCWSpFiTg HTTP/1.0" "Synapse/1.58.1" [0 dbevts]
2022-05-13 15:08:49,602 - synapse.access.http.8008 - 427 - INFO - GET-12248 - 83.25.82.45 - 8008 - {matrix.eviljab.me} Processed request: 0.465sec/0.037sec (0.001sec, 0.000sec) (0.000sec/0.000sec/0) 801B 200 "GET /_matrix/federation/v1/event/%24FyegCkqJo0V9X_RB0VvQn_iPKsTm826PXBD9XVTwgwI HTTP/1.0" "Synapse/1.58.1" [0 dbevts]
2022-05-13 15:08:49,678 - synapse.util.ratelimitutils - 132 - INFO - GET-12253 - Ratelimiter: queueing request (queue now 1 items)
2022-05-13 15:08:49,686 - synapse.metrics._gc - 120 - INFO - sentinel - Collecting gc 2

I do have Prometheus set up, but I'm not sure what to look for and the historical data is mostly gone.

image

image

image

I upgraded from 1.56 to 1.57 on the 2nd of May, and to 1.58 on the 4th of May.

Version information

If not matrix.org:

DMRobertson commented 2 years ago

Thanks for the report and the prometheus screenshots.

For the suspicious change in CPU usage, it'd be worth looking at the broken down CPU graphs to see if there's anything specific. What do your "per-block metrics" and "requests" sections look like---is there any particular bit of the application there that looks to be suddenly churning up CPU?

It might also be instructive to see the "databases" section too.

From the graphs and the dates you give, it looks like there might be a performance regression in 1.57. To confirm this, can you roll back to 1.56 to see if performance improves?

lnicola commented 2 years ago

Per-block metrics, looks like state resolution:

image

Requests:

image

Databases:

image

DMRobertson commented 2 years ago

Possibly the same underlying cause as #12547?

lnicola commented 2 years ago

I tried to roll back and now it won't start: Cannot use this database as it is too new for the server to understand.

DMRobertson commented 2 years ago

Was the server in use between the 28th of April and the 1st of May? The graphs show surprisingly little activity.

lnicola commented 2 years ago

Yes, as far as I know. Even if not actively in use, it should have been present in some federated rooms.

lnicola commented 2 years ago

What exactly is this GC doing?

I see Synapse in the D state (waiting for I/O), but it doesn't seem to be touching the disk.

lnicola commented 2 years ago

Yeah, this does look very similar to #12547.

prurigro commented 2 years ago

Hey, in case you don't catch my response in the other issue, did you want to try disabling federation temporarily and see if the issues go away? If they do, you're most likely looking at the same problem as me.

lnicola commented 2 years ago

I can't really disable federation, but I'm pretty sure it would fix it.

prurigro commented 2 years ago

If it's like the issue is for me, you'll get to a point where your server is basically unusable unless you do. (It took about a month to get there)

lnicola commented 2 years ago

Yeah, it does this every couple of minutes, so it's already unusable.

lnicola commented 2 years ago

@prurigro it seems you were on to something.

Well, this is embarrassing. I had registration open on my server for about two weeks, and I ended up with 3810 spam accounts. They all follow one of these two naming conventions:

I'm not sure what they were doing (besides annoying people). I ran the following query to check the rooms they've joined:

# select room_id, name, canonical_alias, count(*) as count                                                                                                                                                                                                                                  
    from state_groups_state
    inner join room_stats_state using (room_id)
    group by room_id, name, canonical_alias
    order by count desc;

I think these are the more interesting ones:

                     room_id                     |                  name                  |          canonical_alias          |  count   
-------------------------------------------------+----------------------------------------+-----------------------------------+----------
 !twohoPqivntpjGWCJZ:matrix.org                  | My Milf Waifu . com                    |                                   | 39882983
 !naXxpfvVqaRoeCbiug:matrix.org                  | [redacted, offensive]                  | #[redacted]:matrix.org            | 15630320
 !AZiuodkxUdoGQVoeUX:matrix.org                  | dfg                                    | #bogaggooa:matrix.org             |  3774643
 !vRGLvqJYlFvzpThbxI:matrix.org                  | Furry Tech                             | #furrytech:matrix.org             |  2011859
 !ehXvUhWNASUkSLvAGP:matrix.org                  |                                        |                                   |   151291
 !rWyGejKmuZJaExbUOf:matrix.org                  | 111                                    | #sneed1:matrix.org                |   145619
 !rjNeouFqUBGzexdVMc:g33k.se                     | Otaku [OLD]                            |                                   |    22250
 !iCrqcrLOzJUNcyHTjR:matrix.kharkiv.dcomm.net.ua | Новини                                 | #news:matrix.kharkiv.dcomm.net.ua |    20175
 !WlOOWOsGpBBJNXqwSY:stopdronebl.org             |                                        |                                   |    14037

Note the huge number of events in the first one.

I tried to join most of them, but it didn't work, or they don't exist any more.

I deactivated the spam accounts (curl -XPOST -H Authorization: Bearer REDACTED -H Content-Type: application/json -d { "erase": false } http://localhost:8008/_synapse/admin/v1/deactivate/USERNAME), using xargs to run 100 of these in parallel. It was pretty slow, but finished in about one hour or two.

I then tried to also erase their data (same API call, but with erase set to true), but, even with only 10 calls in parallel, these never finish. The server is still unresponsive. Looking at the SQL that's running, the queries I've seen were fine (had indexes and ran quickly). It's just spending time in Synapse for some reason, maybe some kind of "N + 1 selects" problem, or just swapping (I just added an extra 4 GB of swap).

My suspicion is that it will work fine if I purge the two larger rooms. I'm not doing it yet because I'd rather try to completely deactivate the accounts first, and maybe one of the developers would like to investigate the slow deletion.


I'm now trying to purge the largest room, but nothing happens.


Deleting the !vRGLvqJYlFvzpThbxI:matrix.org room took 734 minutes and kicked 124 users. I thought deactivate would have removed them from the room. I'm still trying to delete the spam users, but it only managed to get through 18 of them.

SNThrailkill commented 2 years ago

I tried to roll back and now it won't start: Cannot use this database as it is too new for the server to understand.

I'm also seeing this and can't seem to get my synapse instance back even I roll back my DB and my Synapse version. Any ideas on getting this working again?

prurigro commented 2 years ago

@lnicola OK yeah, I'm thinking this pretty well confirms malicious bots. I'd had open registration for years without really being noticed, but it seems like I ended up on some list of servers at some point and new registrations started to happen. I only hit around 300 accounts total and know for sure that about 30 were real, the majority of the others weren't in any channels so I'm not sure if they were parked for later use or were there to make finding the active ones more difficult.

I noticed my domain had actually been banned from the archlinux.org matrix server, so I assume whatever they're doing includes interacting with other servers. I ended up wiping the database and starting fresh with my top level domain (which hasn't been banned anywhere), and this time registrations are closed. Same configs, same server, same real people and everything is lightning fast now- it's kind of a shame we lost all that history, but what can you do.

One other thing I noticed when deploying a fresh install on the old domain was that I immediately started getting federation activity from dozens of servers, most of which were getting rejected due to not having a proper certificate.

lnicola commented 2 years ago

Update: trying to delete !AZiuodkxUdoGQVoeUX:matrix.org returned an Internal server error after 1814 minutes. I'm still trying to delete the account data (with erase), but it only gets through 10 or 20 80 a day or so.

lnicola commented 2 years ago

I tried py-spy, but it doesn't say much:

synapse

squahtx commented 2 years ago

I tried py-spy, but it doesn't say much:

synapse

By default, each py-spy collects 100 samples per second, ie. each sample is 10 ms apart. There are 6,543 samples in the database thread there and 6,643 samples in _maybe_gc, which is an unusual amount of time to be spending in those two places.

lnicola commented 2 years ago

That's right, I'm seeing gen 2 GC times on the order of minutes:

image

That's also visible in one of the screenshots in the issue description.

lnicola commented 2 years ago

@babolivier sorry for the ping, but I saw your comments in #12778 and #12788.

I'm also having size-related eviction rates that seem quite high for get_current_state_ids and stateGroupsMemberCache:

image

(TL;DR of the issue is that some 2000 spam accounts joined two large rooms. I've deactivated the accounts, but I'm having trouble deleting the rooms.)

I did increase cache.global_factor to 10, but it doesn't seem to help. Can Synapse reload that parameter at run-time (on e.g. systemctl reload matrix-synapse). Should I bump it more?

EDIT: yeah, restarting Synapse appears to have increased the cache size:

image

lnicola commented 2 years ago

Another update: I added more memory and Synapse managed to delete the users from the largest room. It was quite slow, even with force_purge. Right now, the room is empty, and Synapse is removing its users from users_in_public_rooms, including the ones from other servers. This happens one user at a time, and is quite slow -- 1-2/s. With around 141 192 users there, this might take one more day or two.

lnicola commented 2 years ago

It finally finished, but left a lot of junk in current_state_delta_stream, state_groups and state_groups_state. Performance is fine again.

H-Shay commented 2 years ago

Since the issue was determined to be spambots and not a bug, I am going to go ahead and close this.

lnicola commented 2 years ago

@H-Shay I still think the slow account deactivation is a real issue.

H-Shay commented 2 years ago

No problem, would you be willing to open a new issue specific to the slow account deactivation?