Closed ara4n closed 3 years ago
This is hard. ASes were specifically designed NOT to be on the critical path for Matrix HS operations. This means it is not possible for us to synchronously attempt to join the #channel, get rejected, then reject the Matrix join request. The idea was that if you wanted that, you'd set up the mythical Policy Server.
another solution to this would be per-msg ACLs.
this race is particularly bad during bridge startup whilst it's syncing IRC clients onto the network, as the IRC side of the bridge may not have yet connected, whilst the Matrix side is still visible.
Though this is mitigated if the startup period is small (e.g. on IPv6 the difference is an order of magnitude: 10 minutes vs 400+ minutes).
Whilst this bug was originally specifically about leaking +i and +k history, we generally have a problem that history can be leaked to Matrix users for any channel before the IRC virtual user has joined the room. This is a major problem for Freenode, given it means users who are not connected to a channel can still see room history as relayed by the frontier user. Renaming this bug to cover the general problem.
In terms of mitigations:
Swapping to the new membership list API is the quick win here: it'll (hopefully) be more reliable (eg #303) than /sync
for Freenode, and (hopefully) be much faster too, reducing the lag time.
As for the options to kill the race entirely:
I thought we had a solution earlier when discussing this early IRL: for the bridge to only forward traffic for a given room if all the virtual users were connected (i.e. option 3 from above), based on the view of current membership seen by the frontier user. However, this still seems to fail for all the reasons Kegan mentions above. So I think I'm forgetting what the subtlety was...
Another solution on this could be to use https://github.com/Half-Shot/irc-conntrack in future to ensure that IRC connections rarely drop in the first place. Then, we can perhaps enforce that traffic is only ever bridged matrix->irc for a room if we believe all the expected Matrix users are currently joined to that channel.
That sounds like a good solution to me. We never had time to unpick IRC connection management from bridge semantics in the past.
It does however come with a bunch of caveats, notably:
Not relaying until all users are either joined or kicked (if they cannot join) doesn't kill the race entirely, so isn't a feasible option. If TCP connections drop silently (which they frequently do), messages will still be relayed until timeouts kick in (be it TCP level or ping timeouts). Even if this was regarded as "acceptable", what are the semantics if your connection does time out? Kick the corresponding Matrix user immediately from the room? That's a huge UX fail.
By enforcing that traffic is only ever bridged matrix -> irc for a room if all the expected Matrix users are joined to the channel, we make the bridge less robust to connection interruptions. E.g if there is a 1% chance of a connection dying at any one time, and you have 50 users in a room, assuming the connections are sufficiently independent, the chance of any single connection dropping is 50%.
This may not be an issue in practice, given how networking is done on the IRC bridge (assuming that hasn't changed much in the past year or so).
This may not be an issue in practice, given how networking is done on the IRC bridge (assuming that hasn't changed much in the past year or so).
From what we can see of the logs, this happens very rarely in practice these days. TCP connections pinging out to the bridge are rare, and if they happened, we should dutifully go and kick out the Matrix user from the room in question to preserve membership sync.
@Half-Shot is this not solved by https://github.com/matrix-org/matrix-appservice-irc/pull/1337 (which implements option 3 from https://github.com/matrix-org/matrix-appservice-irc/issues/212#issuecomment-266396852)? Or is this not working due to the concern listed at https://github.com/matrix-org/matrix-appservice-irc/pull/1337#issuecomment-850944025?
I believe this is solved by #1337 (and subsequent PRs) where we now (optionally) drop all traffic I->M for a given room if it's members are not all connected to IRC (and we subsequently will try to join and then kick anyone who explicitly cannot join). There is now a state event sent into rooms by the bridge when this happens, so people can follow along at home.
We don't know if a given user has permission to join an IRC channel unless they attempt to do so. So you can end up with a Matrix user joining a Matrix room, which causes their virtual user to join the irc room. If the user is denied access, then we kick them on Matrix. But in that time they may have seen some messages that other valid users bridged into the Matrix room. We need to better bridge ACLs so they can't try to join in the first place, or block them from joining the Matrix room until their IRC client has also joined. In short, we should never share history from other users unless the user can see it themselves