matrix-org / dendrite

Dendrite is a second-generation Matrix homeserver written in Go!
https://matrix-org.github.io/dendrite/
Apache License 2.0
5.67k stars 664 forks source link

Sending typing EDUs causes high load in the federationapi #2182

Open S7evinK opened 2 years ago

S7evinK commented 2 years ago

More as a reminder. Possibly related, as this can degrade QoS - #1622 and maybe #2079

Background information

Description

Steps to reproduce

Several of those log entries:

time="2022-02-11T23:22:20.444381367Z" level=info msg="Sending EDU event" destinations=888 edu_type=m.typing

Disabling Send typing notifications in Element Web helps in this case, but stuff like read markers could probably result in the same behavior on busy servers/rooms.

kegsay commented 2 years ago

This happens because Dendrite hasn't yet blacklisted many of those servers. Attempting to send data to those servers causes high load.

neilalexander commented 2 years ago

The federation API creates a goroutine for each destination — so in this case, 888 goroutines for each of the 888 destinations. That does create a spike as each destination queue wakes up, checks the database for things to send and then creates federation connections. We probably want to run a profile sometime to find out exactly which part of the process ends up being the most expensive, as I can quite believe that it's the database operations that are using the most CPU time.

We see similar spikes on dendrite.matrix.org and similar, so we might want to come up with a way of limiting the number of goroutines that are created for outbound federation in general, but I suspect that may end up meaning that some transactions to some servers take longer to send if they end up queued behind others.

kegsay commented 2 years ago

A worker pool model may be better here e.g. hash(server_name) mod N for N workers. The workers can be either always there or created and killed on demand. The former is simpler but then the goroutines sit around forever, which may not be a problem as parked goroutines aren't particularly expensive?