matrix-org / libera-chat

Support requests for the libera.chat IRC bridge hosted by matrix.org
7 stars 2 forks source link

Silent message drop since the last maintenance #6

Open airone506 opened 1 year ago

airone506 commented 1 year ago

There is silent message drop since the Libera.chat bridge maintenance & upgrade on May 10th, 2023. Estimated ratio of message dropped is ~20% (based on observations, I might add exact numbers later). The Libera.chat messages are not displayed to the Matrix users in the room.

I did some checks in the past, some of them very diligent. I never observed such a massive drop. Actually, I did observe 100% message delivery even on the very diligent checks.

airone506 commented 1 year ago

I checked quite a busy Libera.Chat channel bridged to Matrix. Within a period of six days, there was only 84.08% delivery success rate, i.e. almost 16% of messages being lost.

Half-Shot commented 1 year ago

Hiya, do you know which channel(s) you're seeing this on. We deployed a patch last week to hopefully improve the issue we were seeing, so sounds like we missed a thing.

airone506 commented 1 year ago

The issue applies to any bridged channel I'm in. The figures I posted above are from #python. ##English is another Libera channel with some significant traffic volume, where the issue can be clearly seen. Of course it applies for low traffic volume channels like #libera-matrix and #matrix-irc, where I also definitely saw the issue. Matrix room #plasma:kde.org, bridged to Libera channel #plasma also drops the IRC messages...

Cydox commented 1 year ago

Also seeing this on the #fedora (on libera.chat, matrix: #fedora:fedoraproject.org)

A seemingly random sample of messages from people on IRC don't appear on Matrix.

airone506 commented 1 year ago

I see the issue in every Libera channel where I am connected both via IRC client and Matrix. I did the last check today for all channels and I can confirm it really is in every channel I'm in.

It seems only a very few users, who are using their Matrix also to access Libera, are bothered by this issue. My guess is they are not aware of the issue.

Kleidukos commented 1 year ago

I can confirm that this is currently affecting the #hackage channel on Libera, if you want to see it by yourself, @Half-Shot :)

progval commented 1 year ago

This also happens in channels with "allowUnconnectedMatrixUsers": true.

alkisg commented 1 year ago

I'm also affected by this, e.g. in #debian, #ubuntu, #debian-next, #ubuntu-server etc (of course all in libera).

srett commented 1 year ago

Still happening as of today.

thecb1 commented 1 year ago

I saw messages getting lost in both ways btw.

airone506 commented 1 year ago

I saw messages getting lost in both ways btw.

This does not match my experience. Weren't these lost messages from Matrix (invisible to Libera) from the users, who did not perform !reconnect after May 10th? This would make perfect sense.

For reference. https://github.com/matrix-org/matrix-appservice-irc/issues/1712

thecb1 commented 1 year ago

Well, it won't reconnect for me ...

i-c-o-n commented 1 year ago

Same in #coreboot, virtually nothing from IRC gets through to Matrix.

airone506 commented 1 year ago

... virtually nothing from IRC gets through to Matrix.

Yes, this is the current status of the bridge. Way different from the status when this issue has been opened.

ApostolosB commented 1 year ago

It seems to be working now. Is this considered fixed?

airone506 commented 1 year ago

... Is this considered fixed?

Definitely not fixed. The issue still appears as it did when this ticket has been opened.

progval commented 1 year ago

Indeed, I just had a message dropped in #swh-team:matrix.org (bridged to #swh-team on Libera) at 08:41:42 UTC, in the Libera->Matrix direction. "allowUnconnectedMatrixUsers": true is set in this room.

AndrewFerr commented 1 year ago

Has this improved at all? Some fixes were deployed ~10h ago and should have settled down by now.

progval commented 1 year ago

I see another loss at 17:42:49 (2.5h ago) on #swh-sysadm:matrix.org.

progval commented 1 year ago

Also at 20:21:54 on #libera:libera.chat

airone506 commented 1 year ago

Has this improved at all? Some fixes were deployed...

Thank you for the fixes deployed. I'm sorry to inform the issue is still there.

I just did some quick check in #python Libera channel in the time frame around 18:10 - 18:40 UTC and I saw multiple messages being (seemingly randomly) dropped.

EDIT 1: Just saw multiple messages being dropped in some small Libera channel, time around 21:04 UTC, June 1st.

EDIT 2: Issue seen multiple times in multiple Libera channels between 07:00 and 12:00 UTC, June 2nd.

progval commented 1 year ago

another one today at 16:21:02, again in #swh-team:matrix.org (and it still has "allowUnconnectedMatrixUsers": true). right after the dropped message, a puppet connected to IRC.

etameta commented 1 year ago

Four drops over 20 messages (so 20%) in #linux-it:libera.chat in the last few hours, at 2023-06-06 15:26:08 +0000, 2023-06-06 18:42:55 +0000, 2023-06-06 19:41:54 +0000 and 2023-06-06 19:49:59 +0000 .

Half-Shot commented 1 year ago

I believe this situation has now improved, but please keep letting me know if it's gotten worse.

airone506 commented 1 year ago

I just did quick ad-hoc "human-powered" scan of #python at Libera.Chat, going backwards in the timeline.

I saw one message in #python dropped at 06:01:29 UTC (June 8th). Text of the message was "scipy.signal", if that helps for searching.

Another message #python dropped at 04:45:34 UTC (June 8th). The text of the message was "(or this has been my experience when I've grappled with the same questions, anyway)". I did not continue with the checking after this one.

No message drop seen after that one at 06:01:29 UTC and I can tell the situation has dramatically improved indeed.

@Half-Shot: Are these two drops before some fix has been deployed?

inglor commented 1 year ago

I believe this situation has now improved, but please keep letting me know if it's gotten worse.

Lost messages again this morning 08 Jun 2023 11:22:29 (according to IRC client) GMT+1 in #archlinux-aurweb at Libera.Chat. The next message appearing is 20 seconds later in both IRC and Matrix - screenshots attached.

IRC log: irc-log

Matrix log: matrix-log

This is not a very active channel to have multiple messages coming through.

airone506 commented 1 year ago

Another drops spotted in #python at: 11:36:26 UTC, May 8th, 12:36:45 UTC, May 8th, 13:31:26 UTC, May 8th, 13:39:21 UTC, May 8th, 14:00:35 UTC, May 8th, 14:00:36 UTC, May 8th, 14:19:41 UTC, May 8th, ...

airone506 commented 1 year ago

I saw multiple messages being dropped in multiple rooms. Just letting to know that the issue still exists until now.

srett commented 1 year ago

Still dropping messages as of today

airone506 commented 1 year ago

Still dropping messages as of today.

airone506 commented 1 year ago

Still dropping messages as of today. I'd believe lot of users who rely on the bridge as their daily driver are still unaware of this issue.

srett commented 1 year ago

Indeed, I start to feel this state is worse than having no bridge at all. Even if you know about this, it still leads to confusion and misunderstandings if just the right message gets dropped.

simonmichael commented 1 year ago

Does the bridge or any external service log these events yet, to quantify it and automate the tedious human reporting ?

progval commented 1 year ago

$1LVFwOd4_H-bpmQGdRsWLMo-t7gLHOTrXw6Wfw2k-C4 and $lSDhR2hpjPnlgEcKyMVDcdb-C-iRbDgOikadks3Pu2A in #irc:matrix.org, sent today at 19:00 UTC, was not sent to Libera.

srett commented 1 year ago

If fixing this is too hard, maybe make the bridge just forward each message to Matrix twice. This should lower dropped messages from 10% to 1%.

progval commented 1 year ago

the two messages tulir posted today on #matrix-dev:matrix.org: $cFi1JlMOJaUjPjEiO_NSw0PLRgAiXgcHejQnqh7A1NU and $HxK2VTDpIS9iiS8BcEFSBpYL3dQrGTVlBjtM8ymCU_8, because his puppet disconnected yesterday at 16:02:59Z (that's probably the same issue as #12)

BrenBarn commented 1 year ago

The issue still seems to be happening. Is there any progress? All I see in recent messages on this ticket is people saying "the problem still exists". . .

srett commented 1 year ago

@BrenBarn libera staff made an announcement a month ago that they'd take some action as of July 1st, bit nothing seems to have happened. Which would mean they implicitly picked option 1: https://libera.chat/news/matrix-irc-bridge-updates

BrenBarn commented 1 year ago

@srett: I saw that, but as far as I can see that doesn't really have anything to do with whether this bug is being fixed. That's just libera trying to decide how to work around the problem.

progval commented 1 year ago

This is still happening at the same frequency as last month. Is it useful if we keep reporting them?

ara4n commented 1 year ago

We haven't been updating this issue as much as we could or should; apologies. We've been dealing with the various complications around deportalling while trying to fix the bug, and comms with the libera team has taken priority over comms with the community. Good news(?) is that we have a plausible root cause for the bug and will try to ship a fix on Monday.

I've written up a bit more context on https://news.ycombinator.com/item?id=36923504. Apologies that things got so unreliable for so long.