Closed XDjackieXD closed 1 year ago
Did the upgrade include some kind of glibc upgrade? This can cause problems if you're not using the C locale in postgres, see e.g. https://github.com/matrix-org/synapse/issues/6696#issuecomment-575280941
https://github.com/matrix-org/synapse/issues/11568 sounds similar too.
Otherwise, if you have an example of a room that is still missing, we can give you some SQL commands to run directly in the database to diagnose the problem.
Thanks for the pointers! Yes the update included a glibc upgrade and the locale is not C for whatever reason (I found my initial commands in the shell history for creating the db and it was C so whatever I did in the last years of upgrading managed to accidentally changing the locale...)
So at least I know what to do to prevent this in the future ^^ Is there anything to do right now other than to change the locale? (especially in the direction of "is it likely that the corruption of the db now will cause issues in the future")
Going through the troubleshooting-sql-queries of the linked issues, the following select query returns results (all others listed here return zero hits):
# ERROR: could not create unique index "receipts_linearized_uniqueness"
# DETAIL: Key (room_id, receipt_type, user_id)=(!zXfJBqSUvXySmsZMtB:jki.re, m.read, @tomtau.:matrix.org) is duplicated.
SELECT
room_id, receipt_type, user_id,
COUNT( * )
FROM
receipts_linearized
GROUP BY
room_id, receipt_type, user_id
HAVING
COUNT( * ) > 1
ORDER BY
room_id, receipt_type, user_id;
(I have not run any of the delete queries yet)
Is there anything to do right now other than to change the locale?
Not really. There are some docs on this, see https://matrix-org.github.io/synapse/latest/postgres.html#fixing-incorrect-collate-or-ctype
It's worth fixing this sooner rather than later---as you say, any database problems now will compound into the future.
Going through the troubleshooting-sql-queries of the linked issues, the following select query returns results (all others listed here return zero hits):
I would avoid looking at that specific issue because Synapse's database schema has likely changed a lot since then; it's best to respond to your specific database directly. When you come to import into a new database, postgres will tell you about all the duplicate rows it sees.
Thanks for the help!
I was able to dump, recreate and import and as anticipated there were some duplicates in the receipts_linearized
table which I had to manually delete from the SQL dump (about 20 or so entries) before it would import completely.
Description
I upgraded my server where Postgres 13 and Synapse 1.83.0 was installed, to Postgres 15 and Synapse 1.93.0. The process was:
After that my element-web session was still logged in and seeing all conversations as I left off, but I could not interact (write or receive messages) from some of them. My Phone and element desktop were both logged out and upon logging back in a lot of the conversations/rooms were missing (those that were "broken" in the element-web session). All rooms where people from another server were involved could be recovered, by sending myself the "share room" link from the element-web session and clicking on it in the session on my phone but rooms where only my server was involved and got "lost" could not be recovered that way. Rejoining a room this way produces a "you made no changes" Event in the room only visible in the one client that was used to join the room again.
The loss of conversations/rooms seem to be based on creation time of the room and the time goes back roughly two years (dated based on when I remember joining some groups that still worked vs were broken).
It is not very tragic now but it was certainly very annoying and could be tragic on larger installs (I only have a small hand full of users that I know very well so guiding them through the "recovery" was not a huge deal). How can I debug this further and what can I do to prevent this in the future?
Steps to reproduce
Homeserver
chaosfield.at
Synapse Version
1.93.0
Installation Method
Other (please mention below)
Database
Single Postgresql, always was Postgres, never restored from backup.
Workers
Single process
Platform
Archlinux running as a Xen VM. Postgresql and Synapse directly installed from the repository.
Configuration
No response
Relevant log output
When accessing one of the unrecoverable rooms from the element web session that can still see them all:
When trying to join the unrecoverable room from my android device:
Anything else that would be useful to know?
No response