sopel-irc / sopel

:robot::speech_balloon: An easy-to-use and highly extensible IRC Bot framework. Formerly Willie.
https://sopel.chat
Other
951 stars 405 forks source link

Formatting characters may be grouped in `Trigger` in a surprising way #2590

Open SnoopJ opened 5 months ago

SnoopJ commented 5 months ago

Description

@dgw discovered that my Sopel instance's Unicode plugin was not recognizing italics in a message, even though bold is recognized just fine.

 8.4s unicode_summarize: 
      trigger.raw='@time=2024-01-27T01:42:25.541Z;account=SnoopJ :SnoopJ!~snoopj@user/snoopj PRIVMSG terribot :!u \x1d?' (file:///tmp/q19604111.txt),
      trigger.groups()=('u', '?', '?', None, None, None), s='?'

Where s = trigger.group(3)

It seems that this is because "\x02".isspace() is False and "\x1d".isspace() is True, so the latter is swallowed up into some of the \s baked into the rule automatically generated by Sopel when using @plugin.commands()

As a workaround, I have done the following in that plugin to get a more accurate reflection of the argument (while still ignoring spaces which are not interesting to that plugin's functionality):

cmd = trigger.group(1)
s = trigger.group(0)[len(cmd)+2:].replace(" ", "")

Reproduction steps

Write a command that would care to accept italic text, then observe that the formatting character is not in the groups() of the resulting Trigger.

Expected behavior

I would expect formatting characters to be treated as if they were normal text.

Relevant logs

No response

Notes

No response

Sopel version

973a4893

Installation method

pip install

Python version

3.9.16

Operating system

Ubuntu 20.04

IRCd

No response

Relevant plugins

No response

dgw commented 5 months ago

Contributing that 973a489355540d68b95db01a49e983ac7a740bcc on Python 3.10.7 exhibits a related behavior in sopel-spongemock:

\ .smock italic mocking \ iTaLIc MoCkiNg

This command uses trigger.group(2). Note that the result is not italic because \x1d occurs between group 1 and 2. If \x1d appears in the middle of group 2 instead, it works as expected:

\ .smock mocking italicly? \ mOcKinG iTAliClY?

Thanks to @SnoopJ for helping track this down, and for writing up the issue!