matrix-org / dendrite

Dendrite is a second-generation Matrix homeserver written in Go!
https://matrix-org.github.io/dendrite/
Apache License 2.0
5.64k stars 664 forks source link

cannot gracefully restart polylith federation-sender or sync-api containers #1673

Closed travnewmatic closed 3 years ago

travnewmatic commented 3 years ago

Background information

Description

federation sender and sync api do not restart gracefully.

Steps to reproduce

Error for both (i think): kafka server: The requested offset is outside the range of offsets maintained by the server for the given topic/partition.

federation sender logs:

[tnewman@s540 dendrite]$ k logs federation-sender-67b9bb4d69-jhddd 
time="2020-12-29T09:16:00Z" level=info msg="Starting \"federationsender\" component"
time="2020-12-29T09:16:00.141377854Z" level=info msg="Dendrite version 0.3.4" func="NewBaseDendrite\n\t" file=" [github.com/matrix-org/dendrite/setup/base.go:102]"
time="2020-12-29T09:16:00.156330524Z" level=panic msg="failed to start room server consumer" func="NewInternalAPI\n\t" file=" [github.com/matrix-org/dendrite/federationsender/federationsender.go:76]" error="kafka server: The requested offset is outside the range of offsets maintained by the server for the given topic/partition."
panic: (*logrus.Entry) 0xc00027f1f0

goroutine 1 [running]:
github.com/sirupsen/logrus.Entry.log(0xc00017cc40, 0xc00021b7d0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        github.com/sirupsen/logrus@v1.7.0/entry.go:255 +0x325
github.com/sirupsen/logrus.(*Entry).Log(0xc00027f180, 0x0, 0xc0001d7cc0, 0x1, 0x1)
        github.com/sirupsen/logrus@v1.7.0/entry.go:283 +0xf0
github.com/sirupsen/logrus.(*Entry).Panic(0xc00027f180, 0xc0001d7cc0, 0x1, 0x1)
        github.com/sirupsen/logrus@v1.7.0/entry.go:321 +0x55
github.com/matrix-org/dendrite/federationsender.NewInternalAPI(0xc0003889a0, 0xc000196980, 0x12a0d60, 0xc0001c3ad0, 0xc0001c3aa0, 0x0, 0x0)
        github.com/matrix-org/dendrite/federationsender/federationsender.go:76 +0x3c5
github.com/matrix-org/dendrite/cmd/dendrite-polylith-multi/personalities.FederationSender(0xc0003889a0, 0xc00019c800)
        github.com/matrix-org/dendrite/cmd/dendrite-polylith-multi/personalities/federationsender.go:30 +0xa9
main.main()
        github.com/matrix-org/dendrite/cmd/dendrite-polylith-multi/main.go:77 +0x84e

sync api logs:

[tnewman@s540 dendrite]$ k logs sync-api-88d6d5bd7-fj4kv 
time="2020-12-29T09:15:52Z" level=info msg="Starting \"syncapi\" component"
time="2020-12-29T09:15:52.414534893Z" level=info msg="Dendrite version 0.3.4" func="NewBaseDendrite\n\t" file=" [github.com/matrix-org/dendrite/setup/base.go:102]"
time="2020-12-29T09:15:52.451644075Z" level=panic msg="failed to start key change consumer" func="AddPublicRoutes\n\t" file=" [github.com/matrix-org/dendrite/syncapi/syncapi.go:71]" error="kafka server: The requested offset is outside the range of offsets maintained by the server for the given topic/partition."
panic: (*logrus.Entry) 0xc00034ba40

goroutine 1 [running]:
github.com/sirupsen/logrus.Entry.log(0xc000166c40, 0xc0003559b0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        github.com/sirupsen/logrus@v1.7.0/entry.go:255 +0x325
github.com/sirupsen/logrus.(*Entry).Log(0xc00034b9d0, 0x0, 0xc00035fb28, 0x1, 0x1)
        github.com/sirupsen/logrus@v1.7.0/entry.go:283 +0xf0
github.com/sirupsen/logrus.(*Entry).Logf(0xc00034b9d0, 0x0, 0x112ad8c, 0x23, 0x0, 0x0, 0x0)
        github.com/sirupsen/logrus@v1.7.0/entry.go:329 +0xe5
github.com/sirupsen/logrus.(*Entry).Panicf(...)
        github.com/sirupsen/logrus@v1.7.0/entry.go:367
github.com/matrix-org/dendrite/syncapi.AddPublicRoutes(0xc00032c480, 0x12989a0, 0xc0003701e0, 0x12a0d60, 0xc0003552f0, 0x12935a0, 0xc000370220, 0xc00028ac00, 0xc000334558)
        github.com/matrix-org/dendrite/syncapi/syncapi.go:71 +0x6d0
github.com/matrix-org/dendrite/cmd/dendrite-polylith-multi/personalities.SyncAPI(0xc00034b8f0, 0xc000334000)
        github.com/matrix-org/dendrite/cmd/dendrite-polylith-multi/personalities/syncapi.go:29 +0x107
main.main()
        github.com/matrix-org/dendrite/cmd/dendrite-polylith-multi/main.go:77 +0x84e

i'm able to restart both containers (pods) by dropping and recreated their respective databases.

kegsay commented 3 years ago

You need to keep the polylith component DBs in sync with the Kafka database as they store offsets into the Kafka topics. This is the reason why they fail to start up.

Closing this as a configuration issue, and also because we are going to move away from Kafka in the future to use NATS.