Closed moqmar closed 4 years ago
Small explanation on how I'm running synapse_port_db with Docker:
docker-compose run --entrypoint /bin/ash --rm synapse
./start.py
, if it's done compiling the config, exit Synapse with Ctrl+Csynapse_port_db --sqlite-database /data/homeserver.db.snapshot --postgres-config /compiled/homeserver.yaml
Hi @moqmar. I'm unable to reproduce this following your instructions with a current copy of Synapse v1.12.1rc1, however my guess from the traceback is that the issue may have been caused by this line:
Where this SQL:
was able to pull a row from the event_search
table that had a NULL
'key' column. This shouldn't be possible as that field isn't marked as nullable, but perhaps some database corruption occurred. Do you still have access to a database with this problem? I'd like to run some tests with it if so.
As a workaround you can try searching for any affected rows with the following SQL:
SELECT * FROM event_search WHERE key is NULL;
If any rows turn up, replace the NULL
with the correct 'key' column value (look at adjacent rows for examples), or just feel free to delete the affected rows instead. As this table is just for searching for events, so removing a few entries is harmless. Afterwards, check if problems with porting still persists.
Hm, doesn't look like that's the issue... This is what the table generally looks like on my instance:
Huh, I can see from your database that your column is named KEY
instead of key
.
I'm not sure how your database ended up that way, but that would explain not being able to retrieve key
from the row.
Either you should rename the db column, or change synapse_port_db
by editing:
row["key"]
to
row["KEY"]
I can probably test this on Sunday, synapse_port_db currently says there's something wrong with the default collation of the Postgres Docker image, but I'm also using a bit of an older version probably...
Having the incorrect collation is something that's known to cause DB corruption, so that may be the cause of things. Although I hope the docker image wasn't the one responsible for creating it.
@moqmar Did you get a chance to test this?
Finally got a chance to test this, and yes, changing key
to KEY
in the migration script fixed the issue! Thanks for your help! Do you want to keep this issue open to add the wrong column name as a fallback in the script? I have no idea where it came from, never changed anything with the database.
Having the incorrect collation is something that's known to cause DB corruption, so that may be the cause of things. Although I hope the docker image wasn't the one responsible for creating it.
Seems like that was just because I used the default Docker image of Postres, and it automatically creates a database with the wrong collation - I recreated it with the correct one according to the instructions in https://github.com/matrix-org/synapse/blob/master/docs/postgres.md, and got the script to run that way.
@moqmar Glad to hear it's resolved!
I don't see anything in the codebase that might've caused this... so I'm willing to believe it was random corruption, and if that's the case then it's likely enough to happen to any column, not just key
. If we come across another cause then we will of course file and fix it.
Seems like that was just because I used the default Docker image of Postres, and it automatically creates a database with the wrong collation.
Hmm, I'll have a look at that - seems like it will be a subtle but confusing issue for sysadmins down the road.
Closing this issue for now, thanks again!
Seems like that was just because I used the default Docker image of Postres, and it automatically creates a database with the wrong collation.
Hmm, I'll have a look at that - seems like it will be a subtle but confusing issue for sysadmins down the road.
If the database is created by the postgres docker image, that's not a thing we can change, since we aren't responsible for those images.
In any case, synapse now refuses to initialise a new database with the wrong collation, so it will be much harder to fall into this trap.
Description
After some time (not immediately like with #6544), "synapse_port_db" cancels with an error in line 597, seemingly at a random location (the last "abcd: 100% (54/54)" line varies with every run:
Version information
Version: 1.7.0
Install method: Docker
Platform: Docker on Linux