shaunagm / WelcomeBot

Other
35 stars 43 forks source link

Add rules to catch un-identified names #1

Closed shaunagm closed 9 years ago

shaunagm commented 10 years ago

When users who have registered their nicknames join IRC without identifying, different IRC clients will change their nickname in different ways.

For instance, the default behavior in quassel is to add a trailing underscore (or two, or three) if you haven't identified to your nick. The bot currently catches this (https://github.com/shaunagm/oh-irc-bot/blob/master/bot.py#L113) but if we can determine the rules for other popular clients we can catch those as well.

To improve the bot, please:

1) Research an IRC client (ideally starting with the most popular ones) and determine the default for un-identified nicks.

AND/OR

You can look through the stored nicks for patterns.

2) Add to bot.py, in the function clean_nick(), appropriate rules to deal with that behavior.

Update: The bot now deals with trailing underscores, trailing numbers, and things with a pipe. Is there anything we're missing? The main one I see when browsing through the nicks is something like: name[Mobile]

shaunagm commented 10 years ago

Two potential rules:

1) Any nick with a pipe ("|") that matches the first half - for instance "shauna" and "shauna|away" or "shauna|work" should match.

2) Anything that ends in numbers but matches letters preceding - for instance "shauna" and "shauna1", "shauna2".

shaunagm commented 10 years ago

This pull request partially addresses this issue (although there are probably more rules we can create):

https://github.com/shaunagm/WelcomeBot/pull/29

Currently waiting on the PR submitter to see if they want to redo it to fit with the new format of the bot.

aaparella commented 10 years ago

Going to re-implment this tomorrow, any idea of which rules we'd like to have apply? Having trouble getting the correct information to display here (markdown doesn't like angle brackets?), but the rules I applied last time were a known name, followed by either some set of numbers, or a pipe then any arbitrary text after that.

shaunagm commented 10 years ago

We should probably also get rid of trailing underscores. And, if we don't already, make sure capitalization is disregarded.

aaparella commented 10 years ago

Is the general idea that, if a new user joins, and matches an already-met user after removing underscores, numbers, etc., we ignore them? Do we also want to greet them with a name stripped of those identifiers?

shaunagm commented 10 years ago

The idea is that we ignore them. WelcomeBot is only supposed to greet newcomers. In some hypothetical future we could implement a function where, if you haven't been there in a while, WelcomeBot says "Welcome back!" but I don't think that makes sense to worry about right now, since we don't currently store datetimes of when people are first greeted.

aaparella commented 10 years ago

Added a basic function : https://github.com/aaparella/WelcomeBot/blob/recognize-unregistered-nicks/bot.py#L156

The idea is it would be called instead of actor.replace("_", "") in the check for whether or not a user is a newcomer. I'm sort of torn on the order in which to check for the various delimiters, and am even more torn on what to name the function (though, of course, that's not as important).

Is this the sort of thing you had in mind?

shaunagm commented 10 years ago

I've merged this PR but I'm leaving this issue open, as I'm sure there are more rules we could implement.

shaunagm commented 10 years ago

@aaparella I must have messed up on the merge, because your changes never made it through. I re-added them manually, with a comment crediting you on the commit: https://github.com/shaunagm/WelcomeBot/commit/06077c3e6350baeea02d34fc586b5446f6b5cc4f

shaunagm commented 9 years ago

Looks like there's a bug in this - the stripped names should be used for record keeping purposes, not to communicate with the person. Perhaps we need separate variables for actor-record and actor-nick?

aaparella commented 9 years ago

Ah, I think I may have misunderstood your intention. I'll get a fix soon.

aaparella commented 9 years ago

The reason seems to be the use of the clean_nicks function in parse_message. Should we be removing added identifiers at that stage? Without that I believe it functions as intended (greeting goes to current user name, not cleaned version).

shaunagm commented 9 years ago

Agreed - the question is where we do want to use clean_nicks/remove identifiers. Alternatively, we could do it early on but store both a cleaned and a non-stripped version and reference each as needed.

aaparella commented 9 years ago

All comparisons that I can think of we would want to be between the "cleaned" nick (for checking if they are new, etc.) and then we would only want to use their "full" name when greeting them. I think it would make sense to simply create a cleaned version of the nick in the NewComer constructor and use each as needed.

That is, unless I'm missing a case in which we would want to use the full nick?

shaunagm commented 9 years ago

I think that the greeting is the only point at which we'd want the 'full' name, though I may be missing something. I like your plan.

Would you like to work on the fix for this issue?

aaparella commented 9 years ago

I'd be happy to. Should be able to get it done tonight.

aaparella commented 9 years ago

Pull request made (#48)

shaunagm commented 9 years ago

Merged! Thanks so much @aaparella. I think we've hit all the common ways that nicks get altered, so I'm going to close this, but anyone reading this should feel free to re-open if they find a new way to account for.