Open turt2live opened 1 year ago
the logs at the beginning of the loop don't reveal much:
media_repo_synapse | 2023-06-24 03:27:15,741 - synapse.access.http.8008 - 461 - INFO - GET-25 - ::ffff:172.19.0.4 - 8008 - {@travis:localhost} Processed r
equest: 30.007sec/0.000sec (0.000sec, 0.001sec) (0.001sec/0.003sec/4) 438B 200 "GET /_matrix/client/r0/sync?filter=2&timeout=30000&since=s122_30871_0_14_109_1_4_30_0_1 HTTP/1.0" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36" [0 dbevts]
media_repo_nginx | 172.19.0.1 - - [24/Jun/2023:03:27:15 +0000] "GET /_matrix/client/r0/sync?filter=2&timeout=30000&since=s122_30871_0_14_109_1_4_30_0_1 HTTP/1.1" 200 450 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36" "-"
media_repo_synapse | 2023-06-24 03:27:15,781 - synapse.access.http.8008 - 461 - INFO - POST-29 - ::ffff:172.19.0.4 - 8008 - {@travis:localhost} Processed
request: 0.006sec/0.000sec (0.000sec, 0.001sec) (0.002sec/0.003sec/3) 2687B 200 "POST /_matrix/client/r0/keys/query HTTP/1.0" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36" [0 dbevts]
media_repo_nginx | 172.19.0.1 - - [24/Jun/2023:03:27:15 +0000] "POST /_matrix/client/r0/keys/query HTTP/1.1" 200 2699 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36" "-"
media_repo_synapse | 2023-06-24 03:27:15,784 - synapse.access.http.8008 - 461 - INFO - GET-30 - ::ffff:172.19.0.4 - 8008 - {@travis:localhost} Processed r
equest: 0.008sec/0.000sec (0.000sec, 0.002sec) (0.002sec/0.003sec/4) 283B 200 "GET /_matrix/client/r0/sync?filter=2&timeout=30000&since=s122_30872_0_14_109_1_4_30_0_1 HTTP/1.0" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36" [0 dbevts]
media_repo_nginx | 172.19.0.1 - - [24/Jun/2023:03:27:15 +0000] "GET /_matrix/client/r0/sync?filter=2&timeout=30000&since=s122_30872_0_14_109_1_4_30_0_1 HTTP/1.1" 200 295 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36" "-"
media_repo_synapse | 2023-06-24 03:27:15,829 - synapse.access.http.8008 - 461 - INFO - GET-31 - ::ffff:172.19.0.4 - 8008 - {@travis:localhost} Processed r
equest: 0.005sec/0.000sec (0.000sec, 0.001sec) (0.001sec/0.003sec/4) 283B 200 "GET /_matrix/client/r0/sync?filter=2&timeout=30000&since=s122_30872_0_14_109_1_4_30_0_1 HTTP/1.0" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36" [0 dbevts]
media_repo_nginx | 172.19.0.1 - - [24/Jun/2023:03:27:15 +0000] "GET /_matrix/client/r0/sync?filter=2&timeout=30000&since=s122_30872_0_14_109_1_4_30_0_1 HTTP/1.1" 200 295 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36" "-"
media_repo_synapse | 2023-06-24 03:27:15,864 - synapse.access.http.8008 - 461 - INFO - GET-32 - ::ffff:172.19.0.4 - 8008 - {@travis:localhost} Processed r
equest: 0.005sec/0.000sec (0.000sec, 0.001sec) (0.001sec/0.003sec/4) 283B 200 "GET /_matrix/client/r0/sync?filter=2&timeout=30000&since=s122_30872_0_14_109_1_4_30_0_1 HTTP/1.0" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36" [0 dbevts]
media_repo_nginx | 172.19.0.1 - - [24/Jun/2023:03:27:15 +0000] "GET /_matrix/client/r0/sync?filter=2&timeout=30000&since=s122_30872_0_14_109_1_4_30_0_1 HTTP/1.1" 200 295 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36" "-"
media_repo_synapse | 2023-06-24 03:27:15,899 - synapse.access.http.8008 - 461 - INFO - GET-33 - ::ffff:172.19.0.4 - 8008 - {@travis:localhost} Processed r
equest: 0.005sec/0.000sec (0.000sec, 0.001sec) (0.001sec/0.003sec/4) 283B 200 "GET /_matrix/client/r0/sync?filter=2&timeout=30000&since=s122_30872_0_14_109_1_4_30_0_1 HTTP/1.0" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36" [0 dbevts]
media_repo_nginx | 172.19.0.1 - - [24/Jun/2023:03:27:15 +0000] "GET /_matrix/client/r0/sync?filter=2&timeout=30000&since=s122_30872_0_14_109_1_4_30_0_1 HTTP/1.1" 200 295 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36" "-"
I think that's related to #15082.
Also seeing this in a local development environment with Synapse 1.85.2, doesn't appear to be related to #15082 because payloads are not empty.
Element web's calls to /sync
are returning immediately with the following responses:
3 examples, notice they are all exactly the same:
http://localhost:8008/_matrix/client/r0/sync?filter=0&timeout=30000&since=s1_447_0_1_1_1_1_36_0_1
{
"next_batch": "s1_450_0_1_1_1_1_36_0_1",
"account_data": {
"events": [
{
"type": "org.matrix.msc3890.local_notification_settings.ZSBKLRYBUA",
"content": {
"is_silenced": false
}
}
]
},
"device_one_time_keys_count": {
"signed_curve25519": 50
},
"org.matrix.msc2732.device_unused_fallback_key_types": [
"signed_curve25519"
],
"device_unused_fallback_key_types": [
"signed_curve25519"
]
}
http://localhost:8008/_matrix/client/r0/sync?filter=0&timeout=30000&since=s1_447_0_1_1_1_1_36_0_1
{
"next_batch": "s1_450_0_1_1_1_1_36_0_1",
"account_data": {
"events": [
{
"type": "org.matrix.msc3890.local_notification_settings.ZSBKLRYBUA",
"content": {
"is_silenced": false
}
}
]
},
"device_one_time_keys_count": {
"signed_curve25519": 50
},
"org.matrix.msc2732.device_unused_fallback_key_types": [
"signed_curve25519"
],
"device_unused_fallback_key_types": [
"signed_curve25519"
]
}
http://localhost:8008/_matrix/client/r0/sync?filter=0&timeout=30000&since=s1_447_0_1_1_1_1_36_0_1
{
"next_batch": "s1_450_0_1_1_1_1_36_0_1",
"account_data": {
"events": [
{
"type": "org.matrix.msc3890.local_notification_settings.ZSBKLRYBUA",
"content": {
"is_silenced": false
}
}
]
},
"device_one_time_keys_count": {
"signed_curve25519": 50
},
"org.matrix.msc2732.device_unused_fallback_key_types": [
"signed_curve25519"
],
"device_unused_fallback_key_types": [
"signed_curve25519"
]
}
@thebalaa
3 examples, notice they are all exactly the same:
The request URLs you have given us are also exactly the same. Synapse caches /sync
responses so that it can avoid the work of recalculating them if the requests are identical due to e.g. a retransmission; it is not surprising that you get the same response to the same request!
The since token in your examples is s1_447_0_1_1_1_1_36_0_1
, and the response's next_batch
token is s1_450_0_1_1_1_1_36_0_1
is different. But Element Web did not use the next_batch
token as the since
parameter in its next request; I don't see any evidence of a problem in Synapse here.
(I note that the logs in the description and in https://github.com/matrix-org/synapse/issues/15824#issuecomment-1605249509 also repeat a request with the same since token.)
I would guess that either
since
token.I have a similar issue on Synapse 1.88.0rc1 running locally on docker where sync (via Hydrogen) is tight looped:
since
of the sync request matches the next_batch
of the returned response so I don't think it's a client issue.
I noticed this yesterday and today. Turning off presence seems to stop the continuous sync. Just a workaround but I hope it helps finding the root cause.
Same problem on docker.io/matrixdotorg/synapse:v1.92.1
with sqlite. Can reliably get it to fail when I trigger an exception in another part of the code.
Repro steps involve using docker/sqlite synapse with MAS enabled and hooked up correctly. For some reason, sending messages into a room can fail:
ex-full-stack-docker-synapse-1 | 2023-09-20 13:48:18,624 - synapse.http.server - 140 - ERROR - PUT-62 - Failed handle request via 'RoomSendEventRestServlet': <XForwardedForRequest at 0xffffaab8c5d0 method='PUT' uri='/_matrix/client/v3/rooms/%21IitGfgiPurWkJFrNOJ:synapse/send/m.room.message/1' clientproto='HTTP/1.1' site='8008'>
ex-full-stack-docker-synapse-1 | Traceback (most recent call last):
ex-full-stack-docker-synapse-1 | File "/usr/local/lib/python3.11/site-packages/twisted/internet/defer.py", line 1697, in _inlineCallbacks
ex-full-stack-docker-synapse-1 | result = context.run(gen.send, result)
ex-full-stack-docker-synapse-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ex-full-stack-docker-synapse-1 | File "/usr/local/lib/python3.11/site-packages/synapse/rest/client/room.py", line 366, in _do
ex-full-stack-docker-synapse-1 | ) = await self.event_creation_handler.create_and_send_nonmember_event(
ex-full-stack-docker-synapse-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
ex-full-stack-docker-synapse-1 | File "/usr/local/lib/python3.11/site-packages/synapse/storage/database.py", line 484, in _do_execute
ex-full-stack-docker-synapse-1 | return func(sql, *args, **kwargs)
ex-full-stack-docker-synapse-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^
ex-full-stack-docker-synapse-1 | sqlite3.IntegrityError: FOREIGN KEY constraint failed
ex-full-stack-docker-synapse-1 | 2023-09-20 13:48:18,628 - synapse.access.http.8008 - 465 - INFO - PUT-62 - ::ffff:172.21.0.1 - 8008 - {@bob:synapse} Processed request: 0.053sec/0.000sec (0.009sec, 0.000sec) (0.002sec/0.020sec/15) 55B 500 "PUT /_matrix/client/v3/rooms/%21IitGfgiPurWkJFrNOJ:synapse/send/m.room.message/1 HTTP/1.1" "curl/8.1.2" [0 dbevts]
When this happens, the next /sync request will always tightloop. The problem is this line is always being hit
https://github.com/matrix-org/synapse/blob/v1.92.1/synapse/handlers/sync.py#L1993
... because this is always being hit:
# _cache is not valid at or before the earliest known stream position, so
# return that the entity has changed.
if stream_pos <= self._earliest_known_stream_pos:
self.metrics.inc_misses()
return True
.. because stream_pos == self._earliest_known_stream_pos
. I'm guessing the exception is causing some Fun which isn't propagating the increased stream position or something?
fwiw, I see this without sending messages. MAS is not configured. sqlite is used, though.
It will be sitting completely idle for several minutes and then suddenly tightloop. No user interaction required.
@DMRobertson the sync token is not advanced by Synapse, there's a mistake in my example above. The query string returned by synapse is provided by element but Synapse returns the same since token that was passed in the query string.
@kegsay we tried disabling presence and that seems to resolve the issue for a moment but then it returned
It will be sitting completely idle for several minutes and then suddenly tightloop. No user interaction required.
I had the same experience yesterday but I can't reproduce anymore. 1 container with synapse with a sqlite backend and the default autogenerated config plus URL preview enabled 1 container with Element Web 1 low-privileged user in a room with a few links
It suddenly started to loop extremely fast, even if I restarted the containers or refreshed the browser page, until I deactivated presence.
Hopefully https://github.com/matrix-org/synapse/pull/16540 fixes this
Sounds like it hasn't https://github.com/matrix-org/synapse/pull/16540#issuecomment-1775949595
@erikjohnston should #16540 be reverted?
should #16540 be reverted?
No, that fix is still correct, but not enough to fix this it seems.
Anything we can do to help move this along? It's quite problematic for us.
@thebalaa if you have an example request URL and response body of a /sync
that would be useful
Here's two separate client responses from sync:
From Element:
Request URL http://localhost:8008/_matrix/client/r0/sync?filter=0&timeout=30000&since=s385_307_0_11_70_1_1_19_0_1
Response
{
"next_batch": "s385_307_0_11_70_1_1_19_0_1",
"device_lists": {
"changed": [
"@admin:localhost"
]
},
"device_one_time_keys_count": {
"signed_curve25519": 50
},
"org.matrix.msc2732.device_unused_fallback_key_types": [
"signed_curve25519"
],
"device_unused_fallback_key_types": [
"signed_curve25519"
]
}
From a Python client (matrix-nio): Request URL http://localhost:8008/_matrix/client/r0/sync?access_token=syt_YWRtaW4_KWzSqMuAsUEaRUsAQqcJ_1wY9BF&since=s385_312_0_11_70_1_1_19_0_1
Response
{
"next_batch": "s385_312_0_11_70_1_1_19_0_1",
"device_lists": {
"changed": [
"@admin:localhost"
]
},
"device_one_time_keys_count": {
"signed_curve25519": 50
},
"org.matrix.msc2732.device_unused_fallback_key_types": [
"signed_curve25519"
],
"device_unused_fallback_key_types": [
"signed_curve25519"
]
}
I'm running Synapse version 1.96.0rc1
using Docker with the image matrixdotorg/synapse:develop
. I'm also using an sqlite database.
Here's my homeserver.yaml
. I generated it using the generate
script in the container
# Configuration file for Synapse.
#
# This is a YAML file: see [1] for a quick introduction. Note in particular
# that *indentation is important*: all the elements of a list or dictionary
# should have the same indentation.
#
# [1] https://docs.ansible.com/ansible/latest/reference_appendices/YAMLSyntax.html
#
# For more information on how to configure Synapse, including a complete accounting of
# each option, go to docs/usage/configuration/config_documentation.md or
# https://matrix-org.github.io/synapse/latest/usage/configuration/config_documentation.html
server_name: "localhost"
pid_file: /data/homeserver.pid
listeners:
- port: 8008
tls: false
type: http
x_forwarded: true
resources:
- names: [client, federation]
compress: false
database:
name: sqlite3
args:
database: /data/homeserver.db
log_config: "/data/localhost.log.config"
media_store_path: /data/media_store
registration_shared_secret: "NKeqBJ3@=aEpq@m6&tyg,7Mp.PHsS=SKH#0gJWVH*Nhig0H9nt"
report_stats: false
macaroon_secret_key: "mRL:x_0p*VuKb2ZAio,^h&YCY77ISYs.OKr*+PBM:E0XJz:ZzV"
form_secret: "gr1E3TptS,_v-UfE6vN+l*dgc6:eSk;V3S6ZYm6B:D30O.St-h"
signing_key_path: "/data/localhost.signing.key"
trusted_key_servers:
- server_name: "matrix.org"
# vim:ft=yaml
I'm able to reproduce this issue by doing the following:
@admin
user using the register_new_matrix_user
script inside the synapse container.INSERT INTO ratelimit_override values ("@admin:localhost", 0, 0);
. I found that command from here: https://github.com/matrix-org/synapse/issues/6286#issuecomment-646944920I was messing around a little more and found that when my client sync is doing the tight loop, if I exec into the synapse container and create another user with register_new_matrix_user
, the tight loop immediately stops. Could it possibly be something to do with me disabling ratelimiting for that user? That or maybe something to do with only having a single user in the database? Not sure.
This is particularly frustrating in test environments where it seems easily reproducible due to it being a low traffic environment. I'm guessing actual deployments see this less because new events are somehow able to unwedge things. In my scenario, the error on foreign key constraint will reliably reproduce this bug, but it's a convoluted setup as it involves MAS. I'll see if I can dig into this more this week.
I used the account and password in the Mjolnir configuration file to log in to Element-web, which caused a large number of never-ending _matrix/client/v3/sync requests in the console. đŸ˜‚
Description
In my test environment for matrix-media-repo I've got a localhost homeserver running on sqlite in docker which suddenly returns immediately to
/sync
requests. This causes Element to dutifully send the next request, to which Synapse responds immediately, and repeat.Steps to reproduce
Unclear.
Homeserver
localhost
Synapse Version
v1.85.2
Installation Method
Docker (matrixdotorg/synapse)
Database
sqlite3
Workers
Single process
Platform
Running in Docker.
Configuration
Relevant log output
Anything else that would be useful to know?
Restarting the Docker Compose stack can sometimes help, but it eventually tightloops again.