Closed SentToDevNull closed 2 years ago
Update: Even killing and restarting the server doesn't stop the process. Once dendrite
is launched again, it continues trying to fetch history forever.
(The only solution I've found that helps is killing the process, then completely wiping all databases, restarting, and remembering not to join rooms on other servers.)
Occasionally, I also get events like the following written to stdout:
WARN[2021-07-26T02:11:39.349828028Z] [send.go:280] processTransaction
Transaction: Failed to query room version for room!JlytvgrOTrGPXOfjrK:techlore.net error="QueryRoomVersionForRoom: missing room info for room !JlytvgrOTrGPXOfjrK:techlore.net" req.id=uDO5hPlLj5Ll req.method=PUT req.path=/_matrix/federation/v1/send/1627042800413
I
(Also, this is not just an issue faced when connecting to the techlore.net
server. All matrix.org
servers I've tried behave the same way.)
When creating rooms, there is an option to limit a user such that they only see messages created after joining the room.
When entering public rooms on other servers that allow new users to see all past history, is it possible for the user on a dendrite
server to choose only to see messages generated after they join? (That could be a useful workaround.)
More Context:
This seems to be related to the backfill API: https://spec.matrix.org/unstable/server-server-api/#backfilling-and-retrieving-missing-events.
It appears that the limit for backfilling messages when joining a room is set in federationapi/routing/backfill.go#62 by the server itself. Is there an option somewhere in the codebase where the user can decide what limit
to use when backfill
ing after joining a room on another server? If not, that seems like a very necessary feature.
How much RAM does the system have?
1GiB memory + 1 GiB swap
I was able to get federation to work properly (and am now able to join rooms) by limiting the number of backfill requests to make when joining a room in roomserver/internal/perform/perform_backfill.go#L474:
//tx, err := b.fsAPI.Backfill(ctx, server, roomID, limit, fromEventIDs)
tx, err := b.fsAPI.Backfill(ctx, server, roomID, 100, fromEventIDs)
There should be a better way to do this though. Perhaps we should add some logic to search for an option called backfill_limit_override
in dendrite.yaml
and set the limit
parameter to the lowest value of either "backfill_limit_override
" or "limit
retrieved from server".
Update: Hard-coding an initial backfill limit upon room joining works when joining new servers only. dendrite
still does crash sometimes, but at least I can restart it and join those new servers.
I am unable to join servers that I initially tried connecting to before creating a hard-coded backfill limit. Even after wiping my databases and generating a new matrix private key, regardless of whether or not I try to join them, those servers I had previously tried to join without a backfill limit spam me with backfill events that are causing my server to crash.
Though limiting the initial backfill when joining a room works, when I try to load more events (scrolling up in my Matrix client seems to request ~50 events prior to the last one I have), my server crashes after a while (without any error messages). When I restart dendrite
, I am able to immediately see all those events that I just requested.
I am unable to join #dendrite:matrix.org using dendrite 0.5.0 :)
It hasn't crashed or leaked a noticeable amount of memory, but this is on a far more powerful system than @SentToDevNull is using, with 32GB ram, postgres backend on nvme, monolithic binary compiled with golang 1.15.9 on debian 11.
In Element I can see the list of people, but the chat history is an endless (I gave up after about an hour) spinner. The Dendrite stdout log shows a seemingly unlimited history retrieval, with heavy usage of several CPU cores. I guess this is the same issue, just without the crash because the system doesn't run out of resources.
@bones-was-here hey, I'm looking at this issue because this is happening to me yesterday/today as well! Almost exactly the same problem, down to the server specs even, though I suspected that my slow internet was the culprit.
Nah it's not caused by slow internet, the server I'm using is in a datacentre in Germany with several Gbps connection speed.
When you join a big room there's a lot of state to check. This will cause a memory spike. I think @neilalexander has resolved this with his optimisation work over the past few months?
The situation is certainly better than it was, but there are still unavoidable memory spikes when joining particularly big rooms. I'd be surprised if those spikes went much bigger than a few hundred MB though, unless the auth chain is exceptionally large.
Closing this for now then.
Background information
go version
: go version go1.15.9 linux/amd64Description
I'm using the latest commit as of writing this. (I have experienced the same issue with the latest "release" as well.)
When I launch my server, it seems to work well until I try to join a room on another server with too much history.
Whenever I join any room on another server with a lot of messages, the process of joining the room hangs (visibly, seemingly forever) until my memory usage climbs so high that I have to power off my server (because there's no longer enough memory for me to spawn a shell and kill the process).
Edit: Sometimes, but not always, after it reaches this point, my scheduler appears to kill the
dendrite
server and I don't have to reboot.Steps to reproduce
I'm not doing anything too complicated:
Here is my
dendrite.yaml
file.The console output when I attempt to join other servers is just a seemingly-never-ending stream of INFO statements like the following:
I think
dendrite
is trying to load all history since the beginning of time for every room on every server I join. It just keeps receiving transactions for messages until I run out of memory. One would expect that by default it would only load maybe a couple hundred messages or so, or at least that it would decide to stop loading history after it eats all available memory.