matrix-org / dendrite

Dendrite is a second-generation Matrix homeserver written in Go!
https://matrix-org.github.io/dendrite/
Apache License 2.0
5.74k stars 673 forks source link

dendrite server suddenly goes completely silent #2524

Open grisu48 opened 2 years ago

grisu48 commented 2 years ago

Background information

Description

Since a server reboot this morning I cannot send messages anymore via Element:

image

The server was running fine for the last weeks without any noticeable issues.

In the dendrite log I find error messages like the following:

time="2022-06-09T07:34:51.751232393Z" level=error msg="SendEvents failed" error="InputRoomEventsResponse: context canceled" req.id=wV81oVBTTPge req.method=PUT req.path="/_matrix/client/r0/rooms/!d8rmAZMAod6n7EsZ:m.feldspaten.org/send/m.room.encrypted/$local.c2f3d6f8-eb95-42b3-9ff3-38d8e179e2a9" user_id="@phoenix:m.feldspaten.org"
time="2022-06-09T07:35:15.652729793Z" level=error msg="SendEvents failed" error="InputRoomEventsResponse: context canceled" req.id=3Z5ZPJdx0DkE req.method=PUT req.path="/_matrix/client/r0/rooms/!d8rmAZMAod6n7EsZ:m.feldspaten.org/send/m.room.encrypted/m1654759992909.1" user_id="@phoenix:m.feldspaten.org"
time="2022-06-09T07:36:20.461579016Z" level=error msg="SendEvents failed" error="InputRoomEventsResponse: context canceled" req.id=ZRaKTs4mqVqg req.method=PUT req.path="/_matrix/client/r0/rooms/!d8rmAZMAod6n7EsZ:m.feldspaten.org/send/m.room.encrypted/m1654759992909.1" user_id="@phoenix:m.feldspaten.org"
time="2022-06-09T07:36:41.052252332Z" level=info msg="Executing UpdateUserDailyVisits"
time="2022-06-09T07:37:28.839353423Z" level=error msg="SendEvents failed" error="InputRoomEventsResponse: context canceled" req.id=AbPuX72a3rSu req.method=PUT req.path="/_matrix/client/r0/rooms/!d8rmAZMAod6n7EsZ:m.feldspaten.org/send/m.room.encrypted/m1654759992909.1" user_id="@phoenix:m.feldspaten.org"
time="2022-06-09T07:38:45.475754429Z" level=error msg="SendEvents failed" error="InputRoomEventsResponse: context canceled" req.id=u5zAstAfeuKd req.method=PUT req.path="/_matrix/client/r0/rooms/!d8rmAZMAod6n7EsZ:m.feldspaten.org/send/m.room.encrypted/m1654759992909.1" user_id="@phoenix:m.feldspaten.org"

Steps to reproduce

I'm also attaching the server log after the reboot here: log.txt

grisu48 commented 2 years ago

Anyone an idea? My homeserver is practically toast since an automated reboot last Thursday. I can't send any messages even to internal channels and federated channels remain quiet since back then. I still see the above stated error messages with level=error msg="SendEvents failed" error="InputRoomEventsResponse: context canceled" in the logs.

Logging out and logging in didn't helped, nor did a reboot, nor did an update to the latest container version nor did a re-creation of the whole container (using the old data files)

The instance was working fine for at least a month and stopped working without me touching anything. The reboot happened as a part of the automated update procedure and does not alter the state of the container or it's configuration files.

To me, this issue appeared completely out of the blue and I am not aware of any changes that I did what could have triggered this.

neilalexander commented 2 years ago

context canceled normally means that the client gave up waiting for a response from the server so it closed the connection, and instead of continuing to do work for a client that's given up, we stop processing too. Normally it signals that whatever work we were doing wasn't finished.

Is this happening in just specific rooms or is it happening in all of them?

grisu48 commented 2 years ago

This happens to the whole server, in fact the whole server is completely silent since I posted this bug. Affected rooms are

The latter also applies to "chatty" rooms where I know that traffic is ongoing. Everything went completely silent.

grisu48 commented 2 years ago

Following the input of @neilalexander I updated the title of this issue.

pcmid commented 2 years ago

The same issue after upgrading to v0.8.9 some days.

edocod1 commented 2 years ago

Could this be the same issue as #2566 ?

TheBinaryLoop commented 2 years ago

I have the same issue. Has anybody had any luck with resolving this. My homeserver is useless in this state.

neilalexander commented 2 years ago

Have you evaluated whether your PostgreSQL connection counts are correct? Your Dendrite config must not try to use more connections than PostgreSQL is configured to allow or you'll run into problems like this.

@edocod1 @TheBinaryLoop To be clear are you seeing context canceled specifically, or a different error?

TheBinaryLoop commented 2 years ago

Yes. I see context canceled


From: Neil Alexander @.> Sent: Monday, September 26, 2022 5:57:04 PM To: matrix-org/dendrite @.> Cc: Lukas Eßmann @.>; Mention @.> Subject: Re: [matrix-org/dendrite] dendrite server suddenly goes completely silent (Issue #2524)

Have you evaluated whether your PostgreSQL connection counts are correct? Your Dendrite config must not try to use more connections than PostgreSQL is configured to allow or you'll run into problems like this.

@edocod1https://github.com/edocod1 @TheBinaryLoophttps://github.com/TheBinaryLoop To be clear are you seeing context canceled specifically, or a different error?

— Reply to this email directly, view it on GitHubhttps://github.com/matrix-org/dendrite/issues/2524#issuecomment-1258265227, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADGS77FMYOB6Z5SDVOGUJWTWAHBVBANCNFSM5YJCIH6Q. You are receiving this because you were mentioned.Message ID: @.***>

mat-1 commented 1 year ago

I had this issue and fixed it by deleting the jetstream directory.

vpzomtrrfrt commented 11 months ago

Deleting jetstream did not resolve the issue for me, it still appears randomly

Since there might be multiple issues here, in my case it's:

vpzomtrrfrt commented 10 months ago

my current workaround is a wrapper program that terminates dendrite if it prints /level=error.*context canceled/

troed commented 8 months ago

I just had "this" happen to me after an ISP external IP number change. Nothing I did made any difference, but searching for the "context canceled" brought me here. Since restarting (I use the monolith docker image) made no difference I stopped and removed the image, deleted the jetstream volume and then pulled a fresh image. That finally got the server running again.

Dendrite version 0.13.6+87f028d

(was at 0.13.5 when the IP number initially changed, upgrade as one of the things I tried to get things running again)

Typical log error, and I know they don't say much. That was all there was besides the normal stuff.

time="2024-02-13T17:59:07.748363621Z" level=error msg="SendEvents failed" error="InputRoomEventsResponse: context canceled" req.id=************ req.method=PUT req.path="/_matrix/client/r0/rooms/!hooh0PDGQ1jLlf5u:************/send/m.room.encrypted/kMXEventLocalId_************************" user_id="@*************"

Unfortunately it seems deleting jetstream might have caused secondary issues now that'll I try to fix manually now.

time="2024-02-13T19:47:34.506466310Z" level=warning msg="failed to get state after $************* locally" context=missing error="storage: event IDs missing from the database (0 != 1)" room_id="!****************" txn_event="$****************" txn_prev_events="[*********************]"

niebloomj commented 8 months ago

I keep having this happen to me. My server is completely busted until a rm -rf ./jetstream and restart dendrite. Then things come back online but I get a bunch of broken event ids error logs for a few hours.