sopel-irc / sopel

:robot::speech_balloon: An easy-to-use and highly extensible IRC Bot framework. Formerly Willie.
https://sopel.chat
Other
948 stars 403 forks source link

Stripping control codes #462

Open spreadred opened 10 years ago

spreadred commented 10 years ago

I am trying to write a bot using @rules to trigger certain functions based on channel text. mIRC control codes seem to interfere with this. Is there a way an option to get willie to strip all mIRC control codes before passing the text to @rule?

elad661 commented 10 years ago

Right now we don't have this, but I assume the proper fix would be adding an optional boolean parameter to @rule, that will probably be named strip_control_codes.

When that boolean set to true, willie will know to strip control codes from the message before matching the regex. Message passed to the callable will be then clean from control codes as well.

As not all modules will want that, the default will be set to False.

To implement that, we'll need to make a list of all possible control codes.

spreadred commented 10 years ago

I'm not sure if these are all the possible control codes, but this seems to work to strip the most common (maybe all). I got this expression from a user on stackexchange: regex = re.compile(r"\x1f|\x02|\x12|\x0f|\x16|\x03(?:\d{1,2}(?:,\d{1,2})?)?", re.UNICODE)

elad661 commented 10 years ago

I don't put code copypasted from stackoverflow into my projects.

When you write code, you should not use copy-paste, you should understand what you're doing. In the case of this regex, for example, we don't know what each of this character bytecodes is, and we don't know why re.UNICODE is used. Therefor, this should not be used.

The proper way to implement that would be reading up on these control codes. If they are standard ascii control codes, we already have comprehensive documentation. If not, we need to learn the format and make a list of all possible codes. Leaving a regex like that in will result in cryptic code which will be a pain in the ass to debug.

spreadred commented 10 years ago

I wasn't suggesting that you implement the code from SE. I'm not a professional programmer and I'm completely new to python. I was simply trying to help. As for the usage of control codes, I'm not sure if you can find a definitive document listing them all, as they seem to differ a bit based on which client someone may be using. This might be helpful if you're interested in implementing this stripping feature, although, as I said, I'm unsure if this is an accurate or exhaustive list. http://www.ircbeginner.com/ircinfo/colors.html

Exirel commented 5 years ago

Are mIRC control codes still a thing today?

spreadred commented 5 years ago

@Exirel mIRC is still alive and well, although not under active development IIRC. If you happen to find yourself in channels dedicated to serving files via XDCC and/or other methods, you likely will encounter these control codes, which are used to make their "pack listings" stand out. Although many modern clients seem to transparently strip them.

Exirel commented 5 years ago

Yeah, I saw the latest release of mIRC is from 2018, so it's alive, that's nice! I see that @dgw used some regex in #34, so maybe that could be reused to implement a @rule(..., skip_control_code=True)?

dgw commented 5 years ago

@Exirel That should be sopel-irc/sopel-extras#34, but yes. :grin:

I'm actually going to retitle the issue, because referring to them as "mIRC control codes" is no longer accurate. While mIRC may have begun the trend toward formatting support for IRC, it's far from the only client to recognize formatting characters today. #1302 tracked implementation of all remaining (not yet supported by sopel.tools.formatting methods) codes known to exist in at least one client, as documented by modern.ircdocs.horse at the time.

Exirel commented 5 years ago

Hm, in this case, I even wonder if that skip_control_code argument could be False in 7.x, then True in 8.x if the usage is so common? I mean, that looks reasonable to me.

dgw commented 4 years ago

Implementation thought: https://github.com/sopel-irc/sopel/issues/1768#issuecomment-625857898

dgw commented 4 years ago

After implementation discussion on IRC, we've (@Exirel and I) decided that it'll be cleaner to do this in 8.0 after we no longer have to worry about Python 2 support.

dgw commented 3 years ago

We are now in the 8.0 dev cycle and firmly located in py3-only land. This can be implemented with the use of keyword-only arguments to the relevant decorators now.

Exirel commented 2 years ago

Oh no. I fell down the rabbit hole:

However, I've a counter offer to make:

from sopel import plugin

def _rule_handler(bot, trigger):
    ...

@plugin.plain  # or @plugin.plain(True)
@plugin.rule('.*')
def plain_trigger(bot, trigger):
    _rule_handler(bot, trigger)

@plugin.plain(False)  # the default
@plugin.rule('.*')
def plain_trigger(bot, trigger):
    _rule_handler(bot, trigger)

If someone is really into having both version at the same time, they always can. Making plain a single flag on a plugin callable makes everything more simple and "just work" with very few changes.

Sure, that completely throw away the "but we have to wait until Python 3+ only" thing and also it forces plugin authors to create wrappers if they want both. On the other hand... what are the odd, exactly, to match both version?

... but wait! I've another idea: instead of a boolean flag, plain (or whatever the flag will be), could be more than a binary value. For instance, what if:


@plugin.plain(plugin.PLAIN_ONLY)
def plain_trigger(bot, trigger):
    ...

@plugin.plain(plugin.RAW_ONLY)  # the default for now
def raw_trigger(bot, trigger):
    ...

@plugin.plain(plugin.PLAIN_OR_RAW)
def plain_or_raw_trigger(bot, trigger):
    ...

I'm not convinced that someone will want to say "this pattern will match if and only if it's plain while the second pattern will match only if and only if raw" on the same plugin callable... while having pattern that could match on the wrong version.

So yeah, that one last solution is my best offer.

dgw commented 2 years ago

Postponed for rework of internals to make this and other rule-related features work better. You can see some of the discussion here (the rest was on IRC).