matrix-org / dendrite

Dendrite is a second-generation Matrix homeserver written in Go!
https://matrix-org.github.io/dendrite/
Apache License 2.0
5.67k stars 664 forks source link

Communication with Kafka/Zookeeper fails #1928

Closed tacerus closed 2 years ago

tacerus commented 3 years ago

Hi,

Background information

Description

Steps to reproduce

Dendrite clientapi crashes immediately with the following output:

Jul 18 06:40:03 matrix01.sun.lysergic.dev dendrite-polylith-multi[26117]: time="2021-07-18T04:40:03.255967082Z" level=panic msg="failed to start kafka consumer" func="setupKafka\n\t" file=" [kafka.go:27]" error="kafka: client has run out of available brokers to talk to (Is your cluster reachable?)"
Jul 18 06:40:03 matrix01.sun.lysergic.dev dendrite-polylith-multi[26117]: panic: (*logrus.Entry) 0xc000200c40
Jul 18 06:40:03 matrix01.sun.lysergic.dev dendrite-polylith-multi[26117]: goroutine 38 [running]:
Jul 18 06:40:03 matrix01.sun.lysergic.dev dendrite-polylith-multi[26117]: github.com/sirupsen/logrus.(*Entry).log(0xc000200bd0, 0x0, 0xc00023e360, 0x1e)
Jul 18 06:40:03 matrix01.sun.lysergic.dev dendrite-polylith-multi[26117]:         github.com/sirupsen/logrus@v1.8.1/entry.go:259 +0x2e5
Jul 18 06:40:03 matrix01.sun.lysergic.dev dendrite-polylith-multi[26117]: github.com/sirupsen/logrus.(*Entry).Log(0xc000200bd0, 0x0, 0xc0001d1d38, 0x1, 0x1)
Jul 18 06:40:03 matrix01.sun.lysergic.dev dendrite-polylith-multi[26117]:         github.com/sirupsen/logrus@v1.8.1/entry.go:293 +0x86
Jul 18 06:40:03 matrix01.sun.lysergic.dev dendrite-polylith-multi[26117]: github.com/sirupsen/logrus.(*Entry).Panic(...)
Jul 18 06:40:03 matrix01.sun.lysergic.dev dendrite-polylith-multi[26117]:         github.com/sirupsen/logrus@v1.8.1/entry.go:331
Jul 18 06:40:03 matrix01.sun.lysergic.dev dendrite-polylith-multi[26117]: github.com/matrix-org/dendrite/setup/kafka.setupKafka(0xc0001ac090, 0x41cf50, 0xc0002406c0, 0x30, 0x30)
Jul 18 06:40:03 matrix01.sun.lysergic.dev dendrite-polylith-multi[26117]:         github.com/matrix-org/dendrite/setup/kafka/kafka.go:27 +0x145
Jul 18 06:40:03 matrix01.sun.lysergic.dev dendrite-polylith-multi[26117]: github.com/matrix-org/dendrite/setup/kafka.SetupConsumerProducer(0xc0001ac090, 0x7ffa417fb000, 0x30, 0x30, 0x7ffa41966108)
Jul 18 06:40:03 matrix01.sun.lysergic.dev dendrite-polylith-multi[26117]:         github.com/matrix-org/dendrite/setup/kafka/kafka.go:15 +0x70
Jul 18 06:40:03 matrix01.sun.lysergic.dev dendrite-polylith-multi[26117]: github.com/matrix-org/dendrite/clientapi.AddPublicRoutes(0xc0001986c0, 0xc000198a80, 0xc0001ac1d0, 0x13cef20, 0xc00009fd40, 0xc0000f7d00, 0x13d30a0, 0xc000240630, 0x13afc00, 0xc0001c5480, ...)
Jul 18 06:40:03 matrix01.sun.lysergic.dev dendrite-polylith-multi[26117]:         github.com/matrix-org/dendrite/clientapi/clientapi.go:52 +0x4c
Jul 18 06:40:03 matrix01.sun.lysergic.dev dendrite-polylith-multi[26117]: github.com/matrix-org/dendrite/cmd/dendrite-polylith-multi/personalities.ClientAPI(0xc000037830, 0xc0001ac000)
Jul 18 06:40:03 matrix01.sun.lysergic.dev dendrite-polylith-multi[26117]:         github.com/matrix-org/dendrite/cmd/dendrite-polylith-multi/personalities/clientapi.go:35 +0x2be
Jul 18 06:40:03 matrix01.sun.lysergic.dev dendrite-polylith-multi[26117]: created by main.main
Jul 18 06:40:03 matrix01.sun.lysergic.dev dendrite-polylith-multi[26117]:         github.com/matrix-org/dendrite/cmd/dendrite-polylith-multi/main.go:77 +0x865

Zookeeper prints the following upon receiving the above connection attempt and repeats this a few hundred times:

Jul 18 06:40:03 matrix01.sun.lysergic.dev zookeeper-server-start.sh[4199]: [2021-07-18 06:40:03,002] WARN Exception causing close of session 0x0: null (org.apache.zookeeper.server.NIOServerCnxn)
Jul 18 06:40:03 matrix01.sun.lysergic.dev zookeeper-server-start.sh[4199]: [2021-07-18 06:40:03,255] WARN Exception causing close of session 0x0: null (org.apache.zookeeper.server.NIOServerCnxn)

Upon attempting to read/write to the Kafka instance with a different client, i.e. the Java client described in Kafka's Quick-Start, zero errors are observed, and the communication is handled with no issues.

Besides the default configuration, I attempted some network variations with binding the listeners of the involved components to either IPv4 or IPv6 addresses or hostnames (and always respetively changing the Kafka server definition in the Dendrite configuration), and some consumer variations with specifying custom group.id or maxClientCnxns values - however even with all these attempts no success was observed and the output stayed the same.

Best, Georg

kegsay commented 2 years ago

We're using NATS now.