Open edocod1 opened 2 years ago
When rebooting the server, it looks like a lot of messages are stuck in DendriteRoomOutputEvent
[23931] [INF]
[23931] [INF] ---------------- JETSTREAM ----------------
[23931] [INF] Max Memory: 23.00 GB
[23931] [INF] Max Storage: 7.24 GB
[23931] [INF] Store Directory: "jetstream"
[23931] [INF] -------------------------------------------
[23931] [INF] Restored 0 messages for stream '$G > DendriteInputDeviceListUpdate'
[23931] [INF] Restored 0 messages for stream '$G > DendriteInputRoomEvent'
[23931] [INF] Restored 0 messages for stream '$G > DendriteOutputClientData'
[23931] [INF] Restored 0 messages for stream '$G > DendriteOutputKeyChangeEvent'
[23931] [INF] Restored 0 messages for stream '$G > DendriteOutputNotificationData'
[23931] [INF] Restored 0 messages for stream '$G > DendriteOutputReadUpdate'
[23931] [INF] Restored 0 messages for stream '$G > DendriteOutputReceiptEvent'
[23931] [INF] Restored 14,609 messages for stream '$G > DendriteOutputRoomEvent'
[23931] [INF] Restored 0 messages for stream '$G > DendriteOutputSendToDeviceEvent'
[23931] [INF] Restored 0 messages for stream '$G > DendriteOutputStreamEvent'
[23931] [INF] Recovering 1 consumers for stream - '$G > DendriteInputDeviceListUpdate'
[23931] [INF] Recovering 21 consumers for stream - '$G > DendriteInputRoomEvent'
[23931] [INF] Recovering 1 consumers for stream - '$G > DendriteOutputClientData'
[23931] [INF] Recovering 2 consumers for stream - '$G > DendriteOutputKeyChangeEvent'
[23931] [INF] Recovering 1 consumers for stream - '$G > DendriteOutputNotificationData'
[23931] [INF] Recovering 1 consumers for stream - '$G > DendriteOutputReadUpdate'
[23931] [INF] Recovering 2 consumers for stream - '$G > DendriteOutputReceiptEvent'
[23931] [INF] Recovering 2 consumers for stream - '$G > DendriteOutputRoomEvent'
[23931] [INF] Recovering 2 consumers for stream - '$G > DendriteOutputSendToDeviceEvent'
[23931] [INF] Recovering 1 consumers for stream - '$G > DendriteOutputStreamEvent'
[23931] [INF] Server is ready
Have you tried moving the jetstream directory out of the way before a restart, (jetstream/\$G/streams/)? It sometimes seems to get stuck on an upgrade...
Yes, i have tried that with no success. But doing that unlocks some more log entries which wouldn't otherwise appear:
Can you please enable profiling and then take a goroutine profile after trying to send some messages into a room?
The profile URL will be like this: http://localhost:65432/debug/pprof/goroutine?debug=1
It seems I had the same problem after a power shutdown. Removing the jetstream folder fixed it thanks @spaetz , but I lost all unbridged messages.
By the way, when using ls
I don't have /jetstream/$G/streams but instead an endless list of subdirectories named $G/$G/$G...
du
shows the right tree.
@neilalexander profile here profile.txt
Can you please also capture a http://localhost:65432/debug/pprof/goroutine?debug=2
(that's debug=2
instead of debug=1
)?
Ideally without restarting the process if you can, since that'll nuke possibly valuable information from the trace.
https://paste.debian.net/1246946/
Here it is
Right, my suspicion is that there's a problem with the database connection pool. It looks like Dendrite is stuck waiting for database connections, which probably explains why it isn't progressing.
Can you please tell me whether or not you have configured database connections separately per-component, or did you configure it once in the global.database
section of the config?
What values do you have for max_open_conns
, max_idle_conns
and conn_max_lifetime
?
I have configured database connections separately (though they all point to the same db)
Limits are these
# cat dendrite.yaml | grep max_open
# The "max_open_conns" and "max_idle_conns" settings configure the maximum
max_open_conns: 10
max_open_conns: 10
max_open_conns: 10
max_open_conns: 10
max_open_conns: 10
max_open_conns: 10
max_open_conns: 10
max_open_conns: 10
root@dendrite:~/dendrite# cat dendrite.yaml | grep max_idle
# The "max_open_conns" and "max_idle_conns" settings configure the maximum
max_idle_conns: 2
max_idle_conns: 2
max_idle_conns: 2
max_idle_conns: 2
max_idle_conns: 2
max_idle_conns: 2
max_idle_conns: 2
max_idle_conns: 2
root@dendrite:~/dendrite# cat dendrite.yaml | grep conn_max_l
# "conn_max_lifetime" option controls the maximum length of time a database
conn_max_lifetime: -1
conn_max_lifetime: -1
conn_max_lifetime: -1
conn_max_lifetime: -1
conn_max_lifetime: -1
conn_max_lifetime: -1
conn_max_lifetime: -1
conn_max_lifetime: -1
Hmm, that all looks reasonable... on a second look, the database connections are probably waiting normally (was looking at sql.go
from the wrong Go version and line numbers differed).
Can I ask you to try the latest commit 55d81beab
on the neilalexander/nats290
branch which updates the NATS Server to a newer version, just to rule that out?
Looks like nothing changed. After some time of running, i still have the exact number of messages in the jetstream queue, and everything is stuck as before.
I occasionally run into this error too. The classic symptom is that messages seem to send correctly in Element, but they never get to their destination. When Element is closed and reopened, "some recent messages were not sent" pops out.
It unlocked for some days, since yesterday it is locked again.
Using dendrite 0.9.1 now
Log is full of "Failed do manually update device lists for user"
Background information
go version
: go 1.17.18Description
Steps to reproduce
I would write to the dendrite support room, but with my homeserver not working it is very hard to do so. This is not a "pls fix" request but more of a "hey how do i troubleshoot this?"