matrix-org / matrix-appservice-irc

Node.js IRC bridge for Matrix
Apache License 2.0
462 stars 151 forks source link

Don't use user@host to determine message uniqueness as it can vary on different connections #173

Open juliusv opened 7 years ago

juliusv commented 7 years ago

In the bridged #prometheus channel from Freenode, https://vector.im/beta/#/room/#freenode_#prometheus:matrix.org, the messages of many users from IRC now arrive duplicated in Matrix. That makes the channel less and less usable through Matrix :-/

dupes

I also see this in the mobile Vector client, but don't see the dupes on IRC itself, so I guess it would be a bridging bug.

kegsay commented 7 years ago

Interesting that it only happens with some and not all messages. Logs:

2016-09-15 14:23:01 DEBUG:IrcEventBroker M-daodennis is claiming a hash for cmd PRIVMSG
2016-09-15 14:23:01 INFO:req [8b5h6pn934kcg] [I->M] onMessage: irc.freenode.net from=SuperQ (null@irc.freenode.net) to=#prometheus
2016-09-15 14:47:17 DEBUG:IrcEventBroker murtaza1983[m]1 is claiming a hash for cmd PRIVMSG
2016-09-15 14:47:17 INFO:req [1it4pcjzw8v4k] [I->M] onMessage: irc.freenode.net from=ton31337 (null@irc.freenode.net) to=#prometheus
2016-09-15 14:47:18 DEBUG:IrcEventBroker murtaza1983[m]1 is claiming a hash for cmd PRIVMSG
2016-09-15 14:47:18 INFO:req [7g4is1zglxsso] [I->M] onMessage: irc.freenode.net from=ton31337 (null@irc.freenode.net) to=#prometheus

It looks like any message claimed by murtaza1983[m]1 gets duplicated, but messages claimed by M-daodennis are not.

murtaza1983[m] does seem to have 2 instances (one has the 1 suffix, but their user IDs are distinct), but it looks like we claim based off nick - https://github.com/matrix-org/matrix-appservice-irc/blob/develop/lib/irc/IrcEventBroker.js#L165 - Both clients will desire the same nick, but they should've been clobbered and updated correctly. If, hypothetically, this didn't happen, then you would have 2 clients which think they have the same nick, and if they are the claimer then they would send the message through twice (once f.e message received down their TCP connection). I suspect that is what is happening here.

juliusv commented 7 years ago

Note that I haven't seen this happening anymore for a while. I think it was "fixed" by someone restarting the entire IRC bridge after I reported this.

devnoname120 commented 7 years ago

@juliusv I can confirm that I saw this happen today in #xkcd:freenode.net. Only one guy was affected, and he did not use unconventional software according to him. Another user using an IRC client said that the messages of Stereo did not appear duplicated for him.

matrix duplicated bug

Here is an example of a duplicated message:

{
  "origin_server_ts": 1489698340603,
  "sender": "@_foonetic_Stereo:matrix.org",
  "event_id": "$1489698340284313TdloV:matrix.org",
  "unsigned": {
    "age": 52
  },
  "content": {
    "body": "i'm not using anything unconventional, must be on your end",
    "msgtype": "m.text",
    "formatted_body": "i'm not using anything unconventional, must be on your end",
    "format": "org.matrix.custom.html"
  },
  "type": "m.room.message",
  "room_id": "!SnCqwjBFxKGDZmmeRW:matrix.org"
}
{
  "origin_server_ts": 1489698340680,
  "sender": "@_foonetic_Stereo:matrix.org",
  "event_id": "$1489698340284314gExPF:matrix.org",
  "unsigned": {
    "age": 91
  },
  "content": {
    "body": "i'm not using anything unconventional, must be on your end",
    "msgtype": "m.text",
    "formatted_body": "i'm not using anything unconventional, must be on your end",
    "format": "org.matrix.custom.html"
  },
  "type": "m.room.message",
  "room_id": "!SnCqwjBFxKGDZmmeRW:matrix.org"
}
kegsay commented 7 years ago

Interesting. Thanks for the info, that should be enough for me to correlate with logs. If it's the same underlying cause, I think I know a way of resolving it but we'll see.

kegsay commented 7 years ago

Right, so that channel only has a handful of Matrix users in it, and none of them are the same name/nick, nor were there any connection issues with those connections, and no one ever stole responsibility for passing messages through to the IrcHandler. I see the message is genuinely processed twice, as before:

2017-03-16 21:05:33 DEBUG:IrcEventBroker Cleaning up 53 entries from irc.foonetic.net
2017-03-16 21:05:40 INFO:req [k0upi4h3kv4k4] [I->M] onMessage: irc.foonetic.net from=Stereo (null@irc.foonetic.net) to=#xkcd
2017-03-16 21:05:40 INFO:req [k0upi4h3kv4k4] [I->M] Relaying in room !SnCqwjBFxKGDZmmeRW:matrix.org
2017-03-16 21:05:40 INFO:IrcBridge [-] PUT http://matrix.org/_matrix/client/r0/rooms/!SnCqwjBFxKGDZmmeRW%3Amatrix.org/send/m.room.message/m1489698340587.294 (@_foonetic_Stereo:matrix.org)
2017-03-16 21:05:40 INFO:IrcBridge [-] PUT http://matrix.org/_matrix/client/r0/rooms/!SnCqwjBFxKGDZmmeRW%3Amatrix.org/send/m.room.message/m1489698340587.294 (@_foonetic_Stereo:matrix.org) HTTP 200 {"event_id":"$1489698340284313TdloV:matrix.org"}
2017-03-16 21:05:40 INFO:IrcBridge [k0upi4h3kv4k4] [I->M] SUCCESS  (48ms)
2017-03-16 21:05:40 INFO:req [5rdt37iw7ds0c] [I->M] onMessage: irc.foonetic.net from=Stereo (null@irc.foonetic.net) to=#xkcd
2017-03-16 21:05:40 INFO:req [5rdt37iw7ds0c] [I->M] Relaying in room !SnCqwjBFxKGDZmmeRW:matrix.org
2017-03-16 21:05:40 INFO:IrcBridge [-] PUT http://matrix.org/_matrix/client/r0/rooms/!SnCqwjBFxKGDZmmeRW%3Amatrix.org/send/m.room.message/m1489698340659.295 (@_foonetic_Stereo:matrix.org)
2017-03-16 21:05:40 INFO:IrcBridge [-] PUT http://matrix.org/_matrix/client/r0/rooms/!SnCqwjBFxKGDZmmeRW%3Amatrix.org/send/m.room.message/m1489698340659.295 (@_foonetic_Stereo:matrix.org) HTTP 200 {"event_id":"$1489698340284314gExPF:matrix.org"}
2017-03-16 21:05:40 INFO:IrcBridge [5rdt37iw7ds0c] [I->M] SUCCESS  (58ms)

An interesting point is just when that duplicate came in - just after the previous message had been fully sent. If this was a case of "2 clients claiming the same IRC line" then I'd expect to see the logs in parallel, rather than serial like it is here. It is in parallel on other messages.

This feels more like a race condition in the bridge now, rather than something triggered by funky connections. EVERY one of this person's messages are duplicated, right from when the bridge started up.

kegsay commented 7 years ago

Do all other users see the same prefix for a given originator? LeoNerd sez:

I believe they do, yes. Since it relates only to the originator of the message and not its recipient. I could imagine cases where e.g. Alice had hostname cloaking enabled, so most users get hostname hidden, but Charlie is an IRCOp so is allowed to see through the cloak. But I've no idea if those exist in practice

I have /whois'd the few connections there are in this scenario, no one is flagged as an IRC op (or even logged in with NickServ), so it doesn't sound like that is the case. @leonerd suggests it may be a case-folding issue on the prefix.

I have added verbose logging to the hash with which we claim messages from. @devnoname120: Please can you let me know as soon as this happens again, and I can then take this further, thanks!

devnoname120 commented 7 years ago

@Kegsay This happened again with the same user (not sure if it ever stopped to happen).

{
  "origin_server_ts": 1489790873820,
  "sender": "@_foonetic_Stereo:matrix.org",
  "event_id": "$14897908731044268CqrQT:matrix.org",
  "unsigned": {
    "age": 898
  },
  "content": {
    "body": "just because it's a nonsensical theory doesn't mean it can't say things that are true >_>",
    "msgtype": "m.text",
    "formatted_body": "just because it's a nonsensical theory doesn't mean it can't say things that are true >_>",
    "format": "org.matrix.custom.html"
  },
  "type": "m.room.message",
  "room_id": "!SnCqwjBFxKGDZmmeRW:matrix.org"
}
{
  "origin_server_ts": 1489790892531,
  "sender": "@_foonetic_relsqui:matrix.org",
  "event_id": "$14897908921044416eUEin:matrix.org",
  "unsigned": {
    "age": 308
  },
  "content": {
    "body": "fuck yes, remembered which is which of broca's and wernicke's aphasia because of a line in the linguistics love song",
    "msgtype": "m.text",
    "formatted_body": "fuck yes, remembered which is which of broca's and wernicke's aphasia because of a line in the linguistics love song",
    "format": "org.matrix.custom.html"
  },
  "type": "m.room.message",
  "room_id": "!SnCqwjBFxKGDZmmeRW:matrix.org"
}
kegsay commented 7 years ago

Right, the hashes are different but not for the casing.

2017-03-17 22:47:53 DEBUG:IrcEventBroker ProcessedDict.claim Stereo!stereo@Clk-2541159E.de 
 PRIVMSG  #xkcd
2017-03-17 22:47:53 DEBUG:IrcEventBroker ProcessedDict.claim Stereo!stereo@hide-2541159E.de 
 PRIVMSG  #xkcd

Notice how this user is cloaking themselves, but the way the cloak is represented in the hostname differs. Spoke with @leonerd and he suspects that this is due to a misconfigured IRCd (they don't cloak in the same way!!). @leonerd suggests using the case-folded nick only instead of the entire Nick!user@host as it's entirely possible that this may happen for other cases too (like the aforementioned IRC op).

@devnoname120 : Thank you very much!

devnoname120 commented 7 years ago

@Kegsay This happened again with another user on another room (#henkaku:matrix.org), and this user doesn't cloak themselves:

<Me> SonyUSA: Your messages appear duplicated on Matrix, do you cloak yourself?
<SonyUSA> cloak? i havent done anything

Example of duplicated message:

{
  "origin_server_ts": 1492030436520,
  "sender": "@freenode_SonyUSA:matrix.org",
  "event_id": "$1492030436322277zOLUY:matrix.org",
  "unsigned": {
    "age": 239
  },
  "content": {
    "body": "oh",
    "msgtype": "m.text"
  },
  "type": "m.room.message",
  "room_id": "!SabTNcrymKhDfHdoaQ:xyz.is"
}
{
  "origin_server_ts": 1492030436738,
  "sender": "@freenode_SonyUSA:matrix.org",
  "event_id": "$1492030436322287uRFPG:matrix.org",
  "unsigned": {
    "age": 380
  },
  "content": {
    "body": "oh",
    "msgtype": "m.text"
  },
  "type": "m.room.message",
  "room_id": "!SabTNcrymKhDfHdoaQ:xyz.is"
}
kegsay commented 7 years ago

@devnoname120 - I assume that:

If so, it's likely it's caused by the same issue, but not directly caused by cloaking.

devnoname120 commented 7 years ago

@Kegsay

kegsay commented 7 years ago

I can't see that message as I wasn't in the room at the time. Please paste.

xyzz commented 7 years ago

Example of non duplicated message:

{
  "origin_server_ts": 1492140244924,
  "sender": "@freenode_SonyUSA:matrix.org",
  "event_id": "$14921402441436483EwaDz:matrix.org",
  "unsigned": {
    "age": 362640
  },
  "content": {
    "body": "and then it shows a shot of her gravestone",
    "msgtype": "m.text"
  },
  "type": "m.room.message",
  "room_id": "!SabTNcrymKhDfHdoaQ:xyz.is"
}

Example of duplicated message:

{
  "origin_server_ts": 1492140248922,
  "sender": "@freenode_SonyUSA:matrix.org",
  "event_id": "$14921402481436525FbufA:matrix.org",
  "unsigned": {
    "age": 358642
  },
  "content": {
    "body": "and her gravestone says \"SMH\"",
    "msgtype": "m.text",
    "formatted_body": "and her gravestone says &quot;SMH&quot;",
    "format": "org.matrix.custom.html"
  },
  "type": "m.room.message",
  "room_id": "!SabTNcrymKhDfHdoaQ:xyz.is"
}
{
  "origin_server_ts": 1492140249889,
  "sender": "@freenode_SonyUSA:matrix.org",
  "event_id": "$14921402491436529JyVue:matrix.org",
  "unsigned": {
    "age": 357675
  },
  "content": {
    "body": "and her gravestone says \"SMH\"",
    "msgtype": "m.text",
    "formatted_body": "and her gravestone says &quot;SMH&quot;",
    "format": "org.matrix.custom.html"
  },
  "type": "m.room.message",
  "room_id": "!SabTNcrymKhDfHdoaQ:xyz.is"
}

This keeps happening to this user. I haven't seen it happen to any other IRC users.

Edit: It feels that about half of this user's messages are duplicated. I haven't counted though.