Closed jellykells closed 2 years ago
I am seeing the same error. Also running in monolith mode with Postgres and no Docker, but Go version is 1.16.4
Jul 16 22:11:01 host dendrite-monolith-server[26160]: time="2021-07-16T20:11:01.618840142Z" level=warning msg="Rewriting snapshots 7500-7600 of 31277..." func="UpStateBlocksRefactor\n\t" file=" [2021041615092700_state_blocks_refactor.go:140]"
Jul 16 22:11:01 host dendrite-monolith-server[26160]: time="2021-07-16T20:11:01.674271029Z" level=warning msg="State storage upgrade complete" func="UpStateBlocksRefactor\n\t" file=" [2021041615092700_state_blocks_refactor.go:178]"
Jul 16 22:11:01 host dendrite-monolith-server[26160]: time="2021-07-16T20:11:01.681480025Z" level=panic msg="failed to connect to room server db" func="NewInternalAPI\n\t" file=" [roomserver.go:54]" error="RunDeltas: Failed run migration: ERROR 2021041615092700_state_blocks_refactor.go: failed to run Go migration function func(*sql.Tx) error: cannot xref null state block with snapshot 7503: sql: no rows in result set"
Jul 16 22:11:01 host dendrite-monolith-server[26160]: panic: (*logrus.Entry) 0xc00031c230
Jul 16 22:11:01 host dendrite-monolith-server[26160]: goroutine 1 [running]:
Jul 16 22:11:01 host dendrite-monolith-server[26160]: github.com/sirupsen/logrus.(*Entry).log(0xc00031c1c0, 0x0, 0xc00026c270, 0x23)
Jul 16 22:11:01 host dendrite-monolith-server[26160]: github.com/sirupsen/logrus@v1.8.0/entry.go:259 +0x2e5
Jul 16 22:11:01 host dendrite-monolith-server[26160]: github.com/sirupsen/logrus.(*Entry).Log(0xc00031c1c0, 0x0, 0xc0000f7c08, 0x1, 0x1)
Jul 16 22:11:01 host dendrite-monolith-server[26160]: github.com/sirupsen/logrus@v1.8.0/entry.go:285 +0x86
Jul 16 22:11:01 host dendrite-monolith-server[26160]: github.com/sirupsen/logrus.(*Entry).Logf(0xc00031c1c0, 0x0, 0x11d5886, 0x23, 0x0, 0x0, 0x0)
Jul 16 22:11:01 host dendrite-monolith-server[26160]: github.com/sirupsen/logrus@v1.8.0/entry.go:330 +0xe5
Jul 16 22:11:01 host dendrite-monolith-server[26160]: github.com/sirupsen/logrus.(*Entry).Panicf(...)
Jul 16 22:11:01 host dendrite-monolith-server[26160]: github.com/sirupsen/logrus@v1.8.0/entry.go:368
Jul 16 22:11:01 host dendrite-monolith-server[26160]: github.com/matrix-org/dendrite/roomserver.NewInternalAPI(0xc000258ea0, 0x1332060, 0xc000126420, 0xc000204800, 0x1353078)
Jul 16 22:11:01 host dendrite-monolith-server[26160]: github.com/matrix-org/dendrite/roomserver/roomserver.go:54 +0x20c
Jul 16 22:11:01 host dendrite-monolith-server[26160]: main.main()
Jul 16 22:11:01 host dendrite-monolith-server[26160]: github.com/matrix-org/dendrite/cmd/dendrite-monolith-server/main.go:86 +0x271
Jul 16 22:11:01 host systemd[1]: dendrite.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Jul 16 22:11:01 host systemd[1]: dendrite.service: Failed with result 'exit-code'.
So this error can happen when converting rooms based on Dendrite v0.1.0 which didn't correctly set the state block for m.room.create
events. If you're getting this error that implies that there is a bad state snapshot for an event that isn't the m.room.create
event, which is unusual and unexpected.
To fix this, you can delete the offending snapshot entirely: though this may result in room state errors later down the line for this specific room. I would be interested in the output to the following command:
SELECT * FROM roomserver_events WHERE state_snapshot_nid = 1234;
where 1234
is the offending snapshot NID (7503 or 13936 in this case). This information shouldn't be sensitive so can be inserted into this issue.
I ran the command and the output is empty:
dendrite_room_server=# SELECT * FROM roomserver_events WHERE state_snapshot_nid = 7503;
event_nid | room_nid | event_type_nid | event_state_key_nid | sent_to_output | state_snapshot_nid | depth | event_id | reference_sha256 | auth_event_nids | is_rejected
-----------+----------+----------------+---------------------+----------------+--------------------+-------+----------+------------------+-----------------+-------------
(0 rows)
Okay, so I can modify this check to not error out if there are no events. I don't know how the database got into this state (as it implies you have snapshots which aren't referenced by any events), but that shouldn't be fatal. Expect a patch on Monday on master
if you don't mind experimenting with it (back up your database first!).
@kegsay I upgraded to v0.4.1 and while the state migration got further this time, it failed with a new error while rewriting the last batch. Is there a way to determine which event failed to be converted?
Jul 26 22:18:42 dendrite-host dendrite-monolith-server[7399]: time="2021-07-27T03:18:42.875641660Z" level=warning msg="Rewriting snapshots 21300-21400 of 21375..." func="UpStateBlocksRefactor\n\t" file=" [2021041615092700_state_blocks_refactor.go:140]"
Jul 26 22:18:47 dendrite-host dendrite-monolith-server[7399]: time="2021-07-27T03:18:47.771193581Z" level=warning msg="State storage upgrade complete" func="UpStateBlocksRefactor\n\t" file=" [2021041615092700_state_blocks_refactor.go:247]"
Jul 26 22:18:47 dendrite-host dendrite-monolith-server[7399]: time="2021-07-27T03:18:47.890564448Z" level=panic msg="failed to connect to room server db" func="NewInternalAPI\n\t" file=" [roomserver.go:54]" error="RunDeltas: Failed run migration: ERROR 2021041615092700_state_blocks_refactor.go: failed to run Go migration function func(*sql.Tx) error: 1 events exist in roomserver_events which have not been converted to a new state_snapshot_nid; this is a bug, please report"
Jul 26 22:18:47 dendrite-host dendrite-monolith-server[7399]: panic: (*logrus.Entry) (0x126ee00,0xc0001701c0)
Jul 26 22:18:47 dendrite-host dendrite-monolith-server[7399]: goroutine 1 [running]:
Jul 26 22:18:47 dendrite-host dendrite-monolith-server[7399]: github.com/sirupsen/logrus.(*Entry).log(0xc000170150, 0x0, 0xc00003a1e0, 0x23)
Jul 26 22:18:47 dendrite-host dendrite-monolith-server[7399]: github.com/sirupsen/logrus@v1.8.1/entry.go:259 +0x2e2
Jul 26 22:18:47 dendrite-host dendrite-monolith-server[7399]: github.com/sirupsen/logrus.(*Entry).Log(0xc000170150, 0x0, 0xc000283b88, 0x1, 0x1)
Jul 26 22:18:47 dendrite-host dendrite-monolith-server[7399]: github.com/sirupsen/logrus@v1.8.1/entry.go:293 +0x86
Jul 26 22:18:47 dendrite-host dendrite-monolith-server[7399]: github.com/sirupsen/logrus.(*Entry).Logf(0xc000170150, 0x0, 0x12a5a6c, 0x23, 0x0, 0x0, 0x0)
Jul 26 22:18:47 dendrite-host dendrite-monolith-server[7399]: github.com/sirupsen/logrus@v1.8.1/entry.go:338 +0xe2
Jul 26 22:18:47 dendrite-host dendrite-monolith-server[7399]: github.com/sirupsen/logrus.(*Entry).Panicf(...)
Jul 26 22:18:47 dendrite-host dendrite-monolith-server[7399]: github.com/sirupsen/logrus@v1.8.1/entry.go:376
Jul 26 22:18:47 dendrite-host dendrite-monolith-server[7399]: github.com/matrix-org/dendrite/roomserver.NewInternalAPI(0xc00022e240, 0x1483fc0, 0xc0002f8840, 0xc000235400, 0x14af7c0)
Jul 26 22:18:47 dendrite-host dendrite-monolith-server[7399]: github.com/matrix-org/dendrite@/roomserver/roomserver.go:54 +0x237
Jul 26 22:18:47 dendrite-host dendrite-monolith-server[7399]: main.main()
Jul 26 22:18:47 dendrite-host dendrite-monolith-server[7399]: github.com/matrix-org/dendrite@/cmd/dendrite-monolith-server/main.go:88 +0x27c
Jul 26 22:18:47 dendrite-host systemd[1]: dendrite.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Jul 26 22:18:47 dendrite-host systemd[1]: dendrite.service: Failed with result 'exit-code'.
Yes, I can add additional logging here to help debug this issue.
I have the same issue.
(And when I was trying to upgrade to v0.4.0, I had the same issue as in the original post)
I upgraded to 4cc8b28b7f341c21a0bf7a261d6b49e8276e1e99 and migration is failing with a new error this time.
WARN[2021-08-04T20:59:44.817088875Z] [2021041615092700_state_blocks_refactor.go:152] UpStateBlocksRefactor
Rewriting snapshots 20600-20700 of 20675...
WARN[2021-08-04T21:00:06.717207316Z] [2021041615092700_state_blocks_refactor.go:265] UpStateBlocksRefactor
State storage upgrade complete
PANI[2021-08-04T21:00:06.738043629Z] [roomserver.go:54] NewInternalAPI
failed to connect to room server db error="RunDeltas: Failed run migration: ERROR 2021041615092700_state_blocks_refactor.go: failed to run Go migration function func(*sql.Tx) error: 1 rooms exist in roomserver_rooms which have not been converted to a new state_snapshot_nid; this is a bug, please report"
panic: (*logrus.Entry) 0xc0002481c0
goroutine 1 [running]:
github.com/sirupsen/logrus.(*Entry).log(0xc000248150, 0x0, 0xc001574660, 0x23)
github.com/sirupsen/logrus@v1.8.1/entry.go:259 +0x2e5
github.com/sirupsen/logrus.(*Entry).Log(0xc000248150, 0x0, 0xc0000f5bf0, 0x1, 0x1)
github.com/sirupsen/logrus@v1.8.1/entry.go:293 +0x86
github.com/sirupsen/logrus.(*Entry).Logf(0xc000248150, 0x0, 0x1245bfe, 0x23, 0x0, 0x0, 0x0)
github.com/sirupsen/logrus@v1.8.1/entry.go:338 +0xe5
github.com/sirupsen/logrus.(*Entry).Panicf(...)
github.com/sirupsen/logrus@v1.8.1/entry.go:376
github.com/matrix-org/dendrite/roomserver.NewInternalAPI(0xc0000eb320, 0x13ac080, 0xc000276e10, 0xc000115f00, 0x13ce7c8)
github.com/matrix-org/dendrite/roomserver/roomserver.go:54 +0x20c
main.main()
github.com/matrix-org/dendrite/cmd/dendrite-monolith-server/main.go:89 +0x271
@anton-molyboha on postgres or sqlite?
@anton-molyboha on postgres or sqlite?
Postgres.
I'm also still seeing the same error using 4cc8b28. Running with Go 1.16.5 instead of 1.13.8 now
Edit: The same error that I last posted, not the error anton posted.
@jellykells Are you positive? There are a few similar looking messages (the table names vary) so can you double check it?
I might need some more input on your precise DB to help debug this further.
@kegsay Yes I just double checked. I am running commit 4cc8b28b:
Aug 09 07:22:52 dendrite-host systemd[1]: Started Dendrite Matrix server.
Aug 09 07:22:52 dendrite-host dendrite-monolith-server[160547]: time="2021-08-09T12:22:52.304521683Z" level=info msg="Dendrite version 0.4.1+4cc8b28b" func="NewBaseDendrite\n\t" file=" [base.go:113]"
The same error occurs:
Aug 09 07:52:05 dendrite-host dendrite-monolith-server[160547]: time="2021-08-09T12:52:05.624190350Z" level=warning msg="Rewriting snapshots 21300-21400 of 21375..." func="UpStateBlocksRefactor\n\t" file=" [2021041615092700_state_blocks_refactor.go:152]"
Aug 09 07:52:11 dendrite-host dendrite-monolith-server[160547]: time="2021-08-09T12:52:11.596932232Z" level=warning msg="State storage upgrade complete" func="UpStateBlocksRefactor\n\t" file=" [2021041615092700_state_blocks_refactor.go:259]"
Aug 09 07:52:11 dendrite-host dendrite-monolith-server[160547]: time="2021-08-09T12:52:11.651450905Z" level=panic msg="failed to connect to room server db" func="NewInternalAPI\n\t" file=" [roomserver.go:54]" error="RunDeltas: Failed run migration: ERROR 2021041615092700_state_blocks_refactor.go: failed to run Go migration function func(*sql.Tx) error: 1 events exist in roomserver_events which have not been converted to a new state_snapshot_nid; this is a bug, please report"
Aug 09 07:52:11 dendrite-host dendrite-monolith-server[160547]: panic: (*logrus.Entry) 0xc00018a150
Aug 09 07:52:11 dendrite-host dendrite-monolith-server[160547]: goroutine 1 [running]:
Aug 09 07:52:11 dendrite-host dendrite-monolith-server[160547]: github.com/sirupsen/logrus.(*Entry).log(0xc00018a0e0, 0x0, 0xc0003242a0, 0x23)
Aug 09 07:52:11 dendrite-host dendrite-monolith-server[160547]: github.com/sirupsen/logrus@v1.8.1/entry.go:259 +0x2e5
Aug 09 07:52:11 dendrite-host dendrite-monolith-server[160547]: github.com/sirupsen/logrus.(*Entry).Log(0xc00018a0e0, 0x0, 0xc000193bf0, 0x1, 0x1)
Aug 09 07:52:11 dendrite-host dendrite-monolith-server[160547]: github.com/sirupsen/logrus@v1.8.1/entry.go:293 +0x86
Aug 09 07:52:11 dendrite-host dendrite-monolith-server[160547]: github.com/sirupsen/logrus.(*Entry).Logf(0xc00018a0e0, 0x0, 0x1234c67, 0x23, 0x0, 0x0, 0x0)
Aug 09 07:52:11 dendrite-host dendrite-monolith-server[160547]: github.com/sirupsen/logrus@v1.8.1/entry.go:338 +0xe5
Aug 09 07:52:11 dendrite-host dendrite-monolith-server[160547]: github.com/sirupsen/logrus.(*Entry).Panicf(...)
Aug 09 07:52:11 dendrite-host dendrite-monolith-server[160547]: github.com/sirupsen/logrus@v1.8.1/entry.go:376
Aug 09 07:52:11 dendrite-host dendrite-monolith-server[160547]: github.com/matrix-org/dendrite/roomserver.NewInternalAPI(0xc000189170, 0x139b040, 0xc0004008a0, 0xc00012ef80, 0x13bd788)
Aug 09 07:52:11 dendrite-host dendrite-monolith-server[160547]: github.com/matrix-org/dendrite/roomserver/roomserver.go:54 +0x20c
Aug 09 07:52:11 dendrite-host dendrite-monolith-server[160547]: main.main()
Aug 09 07:52:11 dendrite-host dendrite-monolith-server[160547]: github.com/matrix-org/dendrite/cmd/dendrite-monolith-server/main.go:89 +0x271
Aug 09 07:52:11 dendrite-host systemd[1]: dendrite.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Aug 09 07:52:11 dendrite-host systemd[1]: dendrite.service: Failed with result 'exit-code'.
Happy to provide additional info if necessary.
I'll add some additional logging here and let you know when you can try again.
Any updates on this?
I'll look into this this week and push some logs to master
. I've been on holiday for a bit.
@jellykells and @anton-molyboha please try the latest master
and give me the log output when it fails (in this issue should be fine). There will be queries you can run to manually fix this, but it will involve data loss for the affected room.
@kegsay The last messages in the output:
time="2021-09-07T16:38:28.883294305Z" level=warning msg="Rewriting snapshots 21200-21300 of 21244..." func="UpStateBlocksRefactor\n\t" file=" [2021041615092700_state_blocks_refactor.go:152]"
time="2021-09-07T16:38:40.035361023Z" level=error msg="Affected row: room_id=!xxyFVdFcocTnFefXVY:matrix.org snapshot=3967 last_sent=794889" func="UpStateBlocksRefactor\n\t" file=" [2021041615092700_state_blocks_refactor.go:289]"
time="2021-09-07T16:38:40.035934574Z" level=error msg="To fix this manually, run this query first then retry the migration: UPDATE roomserver_rooms SET state_snapshot_nid=0 WHERE room_id='!xxyFVdFcocTnFefXVY:matrix.org'" func="UpStateBlocksRefactor\n\t" file=" [2021041615092700_state_blocks_refactor.go:293]"
time="2021-09-07T16:38:40.035974398Z" level=error msg="Running this UPDATE will cause the room in question to become unavailable on this server. Leave and re-join the room afterwards." func="UpStateBlocksRefactor\n\t" file=" [2021041615092700_state_blocks_refactor.go:295]"
time="2021-09-07T16:38:40.037369382Z" level=warning msg="State storage upgrade complete" func="UpStateBlocksRefactor\n\t" file=" [2021041615092700_state_blocks_refactor.go:297]"
time="2021-09-07T16:38:40.071962625Z" level=panic msg="failed to connect to room server db" func="NewInternalAPI\n\t" file=" [roomserver.go:54]" error="RunDeltas: Failed run migration: ERROR 2021041615092700_state_blocks_refactor.go: failed to run Go migration function func(*sql.Tx) error: 1 rooms exist in roomserver_rooms which have not been converted to a new state_snapshot_nid; this is a bug, please report"
panic: (*logrus.Entry) 0xc000240460
I will try following the instructions and report here how it goes.
Running that "UPDATE" SQL statement solved the issue! I am now running commit 7dc8fb1fe780888d5758bf665949e9e09d21e56a. Thank you!
I ended up resetting my Dendrite server due to other issues, but I believe this specific issue was also fixed for me by running the suggested SQL statement added in #2005 , so I'll go ahead and close this.
Background information
go version
: 1.13.8Description
I'm not sure how to interpret this error. Clearly the state storage upgrade did not complete, despite the log saying it did.
Steps to reproduce