matrix-org / synapse

Synapse: Matrix homeserver written in Python/Twisted.
https://matrix-org.github.io/synapse
Apache License 2.0
11.83k stars 2.13k forks source link

Fix bug that could cause a `/sync` to tightloop with sqlite after restart #16540

Closed erikjohnston closed 1 year ago

erikjohnston commented 1 year ago

This could happen if the last rows in the account data stream were inserted into account_data. After a restart the max account ID would be calculated without looking at the account_data table, and so have an old ID.

DMRobertson commented 1 year ago

Fixes #15824?

erikjohnston commented 1 year ago

Fixes #15824?

Quite possibly

thebalaa commented 1 year ago

Didn't fix #15824

Still seeing with the following development based container:

 ...
                "gitsha1": "3df70aa80001e05b0bbe69fd3328f11aceaab4aa",
                "org.homeserver": "true",
                "org.opencontainers.image.documentation": "https://github.com/matrix-org/synapse/blob/master/docker/README.md",
                "org.opencontainers.image.licenses": "Apache-2.0",
                "org.opencontainers.image.source": "https://github.com/matrix-org/synapse.git",
                "org.opencontainers.image.url": "https://matrix.org/docs/projects/server/synapse",
                "org.opencontainers.image.version": "1.95.0rc1"

Note the below sync query returns a response that does not advance the next_batch:

http://localhost:8008/_matrix/client/r0/sync?filter=0&timeout=30000&since=s93_7_0_1_5_1_1_11_0_1

{
    "next_batch": "s93_7_0_1_5_1_1_11_0_1",
    "device_lists": {
        "changed": [
            "@admin:localhost"
        ]
    },
    "device_one_time_keys_count": {
        "signed_curve25519": 50
    },
    "org.matrix.msc2732.device_unused_fallback_key_types": [
        "signed_curve25519"
    ],
    "device_unused_fallback_key_types": [
        "signed_curve25519"
    ]
}

Restarting synapse fixes the tightloop temporarily but it returns within a few minutes.

Our reproduction steps: We have a custom matrix-nio based client that is syncing, we then login via lement with the same matrix ID and within a few minutes it will start tightlooping.

SQLite database docker / docker compose deployment

erikjohnston commented 1 year ago

Hmm, I wonder if we have a similar problem with device lists then