matrix-org / dendrite

Dendrite is a second-generation Matrix homeserver written in Go!
https://matrix-org.github.io/dendrite/
Apache License 2.0
5.67k stars 664 forks source link

UNIQUE constraint failed: federationsender_joined_hosts.event_id #1838

Closed zephryn closed 3 years ago

zephryn commented 3 years ago

Background information

Description

when (re)starting my dendrite instance, it panics immediately with the following logged from stdout/stderr. i've removed the timestamps + personal information and reformatted one of the single-line logs for brevity and to hopefully make it a bit more readable.

these seem normal, just included for context.

level=info msg="Dendrite version 0.3.11" func="NewBaseDendrite\n\t" file=" [github.com/matrix-org/dendrite/setup/base.go:110]"
level=info msg="Enabled perspective key fetcher" func="NewInternalAPI\n\t" file=" [github.com/matrix-org/dendrite/signingkeyserver/signingkeyserver.go:103]" num_public_keys=2 server_name=matrix.org

right after, though, it repeats this error for four different rooms:

level=error
msg="Failed to get server ACLs for room \"!(matrix room id)\""
func="NewServerACLs\n\t"
file=" [github.com/matrix-org/dendrite/roomserver/acls/acls.go:60]"
error="storage: state NIDs missing from the database (0 != 1)"

then, it panics:

level=panic
msg="roomserver output log: write room event failure"
func="onMessage\n\t"
file=" [github.com/matrix-org/dendrite/federationsender/consumers/roomserver.go:113]"
add="[$tOVMb84oHTyCHPye2sOWIu7q8QWbYgPgJvMDDtcHYwg]"
del="[$a8K1pSHfmDL0EuKlw4pelwjWZv1qBlluSIyqfGaPoXU]"
error="UNIQUE constraint failed: federationsender_joined_hosts.event_id"
event="{
    \"auth_events\":[
        \"(event id)\",
        \"(event id)\",
        \"(event id)\"
    ],
    \"content\":{
        \"body\":\"(message body)\",
        \"msgtype\":\"(message type)\"
    },
    \"depth\":(depth),
    \"hashes\":{
        \"sha256\":\"(sha256 hash)\"
    },
    \"origin\":\"(sender's homeserver domain name)",
    \"origin_server_ts\":(unix timestamp),
    \"prev_events\":[
        \"(event id)\"
    ],
    \"prev_state\":[],
    \"room_id\":\"(matrix room id)\",
    \"sender\":\"(sender of message)",
    \"signatures\":{
        \"(sender's homeserver domain name)\":{
            \"ed25519:a_qfUj\":\"(ecc key/signature, i assume)\"
        }
    },
    \"type\":\"m.room.message\",
    \"unsigned\":{
        \"age_ts\":(another unix timestamp)
    }
}"
event_id="(event id)"
panic: (*logrus.Entry) 0xc0004de770

goroutine 39 [running]:
github.com/sirupsen/logrus.Entry.log(0xc000032d20, 0xc00007f200, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        github.com/sirupsen/logrus@v1.7.0/entry.go:255 +0x325
github.com/sirupsen/logrus.(*Entry).Log(0xc0004de700, 0x0, 0xc0004ef9b8, 0x1, 0x1)
        github.com/sirupsen/logrus@v1.7.0/entry.go:283 +0xf0
github.com/sirupsen/logrus.(*Entry).Logf(0xc0004de700, 0xc000000000, 0x55ba1e22da3d, 0x2f, 0x0, 0x0, 0x0)
        github.com/sirupsen/logrus@v1.7.0/entry.go:329 +0xe5
github.com/sirupsen/logrus.(*Entry).Panicf(...)
        github.com/sirupsen/logrus@v1.7.0/entry.go:367
github.com/matrix-org/dendrite/federationsender/consumers.(*OutputRoomEventConsumer).onMessage(0xc0003a9f00, 0xc0004e4140, 0x1, 0x101)
        github.com/matrix-org/dendrite/federationsender/consumers/roomserver.go:113 +0x735
github.com/matrix-org/dendrite/internal.(*ContinualConsumer).consumePartition(0xc000494b40, 0x55ba1e60b978, 0xc00021dc08)
        github.com/matrix-org/dendrite/internal/consumers.go:126 +0xd8
created by github.com/matrix-org/dendrite/internal.(*ContinualConsumer).StartOffsets
        github.com/matrix-org/dendrite/internal/consumers.go:107 +0x532

before today, things were running just fine aside from noticing that dendrite was using a surprising amount of resources at times; not sure if that has anything to do with whatever caused this problem, though.

upon further research, the panic error seems to be the same as issue #1786. i definitely don't recall my homeserver crashing for the same reason provided there (making a new conversation) but i'll try to provide updates if the situation changes in the future.

zephryn commented 3 years ago

update: the server acl errors might not be directly related to the panic, my bad ^^ issue #1844 seems to mention a similar problem in regards to the NID errors.

zephryn commented 3 years ago

yet another update: fixed it, at least enough to get back onto my account! had to go into federationsender.db and remove the last entry in federationsender_joined_hosts and also removed the event entry from roomserver.db in roomserver_events, but i'm not sure if modifying the second db mentioned helped to resolve the issue since it still panicked when not removing the entry from the former.

kegsay commented 3 years ago

This was fixed in https://github.com/matrix-org/dendrite/pull/1824 which is on master but is unreleased yet.