matrix-org / matrix-appservice-irc

Node.js IRC bridge for Matrix
Apache License 2.0
460 stars 151 forks source link

One or two sed bugs #1627

Open nckx opened 1 year ago

nckx commented 1 year ago

Hi! Not a severe bug report, but I just saw a strange s/…/ cross the bridge and thought you might be interested.

When a Matrix user edited

user: I am new to guix. When reading https://guix.gnu.org/manual/devel/en/html_node/Using-the-Configuration-System.html, I says "Rust is currently unavailable on non-x86_64 platforms". Is that still true?

into

user: I am new to guix. When reading https://guix.gnu.org/manual/devel/en/html_node/Using-the-Configuration-System.html, it says "Rust is currently unavailable on non-x86_64 platforms". Is that still true?

the bridge produced:

user[m]: s/html_node/html\_node/, s/I/it/, s/x86_64/x86\_64/

…wait, why is _ being escaped at all?

progval commented 1 year ago
  • I'm not entirely sure whether s/I/it/ is strictly a bug or an attempt to emulate how humans use s/…/ informally, expecting the reader to know which I was obviously wrong. If so, maybe that heuristic could be tweaked to avoid the extreme ambiguity here, or maybe it's working as designed. Or it's a bug :-)

https://github.com/matrix-org/matrix-appservice-irc/issues/1574

  • The other two are just bogus. It looks like a string is mistakenly escaped twice before being fed into the diff — but interestingly, only on one side?

…wait, why is _ being escaped at all?

Probably https://github.com/vector-im/element-web/issues/22456

Matrix messages work like multipart emails; ie. senders write the same content in multiple formats, and receivers only pick one of the formats. Here, Element sends different content in each format, and matrix-appservice-irc does not read the same format that most Matrix clients do.

nckx commented 1 year ago

Thanks for the extremely swift reply.

[Written before your edit above] A (different) user reasonably pointed out that the escaping might have been done intentionally by the (above) user, to remove unwanted mark-up.

I did look at the message on the Matrix side before reporting this and didn't see any formatting, but now I notice both ‘versions’ are in fact identical in Element: https://www.tobias.gr/temp.png

I'm aware that this isn't the Element bug tracker, but am I misunderstanding how history works?

tadzik commented 1 year ago

…wait, why is _ being escaped at all?

That'd be a bug, yes :)

* I'm not entirely sure whether `s/I/it/` is strictly a bug or an attempt to emulate how humans use `s/…/` informally, expecting the reader to know _which_ `I` was obviously wrong.

Yeah, that's not ideal. Off the top of my head the only way to improve it without adding massive amounts of noise would be to use the regex adverbs that Raku (formerly Perl 6) uses, like s:2nd/I/it/ – though it's perhaps a stretch to assume that the average person will find that familiar and/or easily understandable.