twisted / twisted

Event-driven networking engine written in Python.
https://twisted.org
Other
5.57k stars 1.17k forks source link

parseuser function for irc.py #3286

Open twisted-trac opened 16 years ago

twisted-trac commented 16 years ago
Wolf's avatar Wolf reported
Trac ID trac#3286
Type enhancement
Created 2008-06-07 04:59:36Z

In irc.py the user is always passed to the API methods in the form 'nick!realname@host', this module-level function will split the string and return 'nick', 'realname', 'host'. If the user can't be parsed it is returned as it is (even if this is quite useless because doing nick, realname, host = parseuser(user) or nick = parseuser(user)[0] will result in an error and returning None will have the same result, so maybe I could raise a specific error).

If the function is ok I can replace all the nick = user.split('!', 1)[0] with nick = parseuser(user)[0].

I also added 5 tests to test_irc.py to test the function.

Attachments:

Searchable metadata ``` trac-id__3286 3286 type__enhancement enhancement reporter__Wolf Wolf priority__normal normal milestone__ branch__ branch_author__ status__new new resolution__None None component__words words keywords__irc_IRCClient_parseuser irc IRCClient parseuser time__1212814776000000 1212814776000000 changetime__1255720190000000 1255720190000000 version__None None owner__jonathanj jonathanj cc__Leonidas ```
twisted-trac commented 16 years ago
exarkun's avatar @exarkun set owner to Wolf
twisted-trac commented 16 years ago
Wolf's avatar Wolf removed owner
twisted-trac commented 16 years ago
glyph's avatar @glyph set owner to @glyph
@glyph set status to assigned
twisted-trac commented 16 years ago
glyph's avatar @glyph set owner to Wolf
@glyph set status to new
twisted-trac commented 15 years ago
jonathanj's avatar @jonathanj removed owner

I've attached a patch that implements an object that parses a prefix (user or otherwise) and exposes some useful attributes. All code that previously split the prefix string manually has been modified to make use of this object.

twisted-trac commented 15 years ago
glyph's avatar @glyph set owner to jonathan
  1. I think _nickOrPrefix needs a better docstring, and perhaps a better name. "Otherwise" parse the prefix? Other than what? What's the return type? And so on.
  2. Aesthetically, I think I might prefer it if it were a classmethod of Prefix.
  3. I'm a little uncomfortable with so many lines of code changing with no change in the covering test code. Maybe that's OK though because it's just a better way to implement the exact same logic, but I suspect that there are things which will be done properly now which were done wrong before.

Otherwise, I really like the idea that "prefixes" will be objects rather than random chunks of strings. Looking forward to landing this.

twisted-trac commented 15 years ago
glyph's avatar @glyph set owner to @jonathanj
twisted-trac commented 15 years ago
glyph's avatar @glyph set owner to @jonathanj

Some of these comments reverse earlier positions that I took, and I'm adding some work rather late in the review cycle here; I apologize for that, but I didn't really understand what was going on before. I can see now that this is going to be an important core of a new IRC API and so it deserves special attention.

  1. Prefix.prefix shouldn't be public. Also, I would much rather that it not be referred to as 'prefix'; it's hard enough already that there's an object called Prefix and a bunch of strings called prefix lying around.
  2. I would really rather there were a method for determining a prefix's known-ness rather than having an algorithm described in the docstring. Consider a Prefix.isKnown().
  3. The docstring needs to link to some standard or RFC so that users can read more in depth about what a prefix really means.
  4. Prefix.nickname, Prefix.realname, and Prefix.host all really need much more detailed documentation. IRC is extremely confusing, especially given the freestyle nonsense that freenode has started putting into these fields.
  5. For consistency, it should be Prefix.hostname; or, it should be Prefix.real and Prefix.nick. Choose a theme and stick with it: 'name' suffix or no? You can disregard this point if the standards consistently refer to each of these by inconsistent names, but I don't think so.
  6. Prefix.prefixUsername should have a different name. On #twisted, idnar suggested displayName, and I think that's a good choice.
  7. Also, there's no reason that Prefix.prefixUsername should be a classmethod, it should be an instancemethod, or possibly even a property. Everywhere that says Prefix.prefixNickname(prefix) could just as easily say Prefix(prefix).displayName()
  8. To summarize my understanding from our IRC conversation, I can see now that handing this "display name" to application in place of a real nickname is wrong and will lead to incorrect behavior in some cases. For example, if one receives a private message from "foo.freenode.net", one can't reply, but the API as it stands makes it impossible to distinguish between that and a user with that nickname. In order to rectify this, we need to pass Prefix objects to application code. I think this would be a good opportunity to clean up the mess that is IRCClient and add a division like, for example, the division that exists in POP3, where IMailbox is the cred interface and that's what you implement. On the client side, we obviously can't use cred, but we can pass in an application object which receives callbacks rather than subclassing. On the server side, IUser is already the right way to implement something. Please file a new ticket for this, and reference that ticket from Prefix's docstring so that future maintainers will know where this code is headed.
twisted-trac commented 15 years ago
jonathanj's avatar @jonathanj set owner to @glyph
twisted-trac commented 16 years ago
Wolf's avatar Wolf commented

Replying to Wolf:

I implemented all the options, using a default PREFIX = '@%+'. I just notice that the PREFIX value sent by the server is like PREFIX=(ohv)@%+ and I'm not sure what is the best way to store this values, because sometimes the server uses '@%+' and sometimes 'ohv'.

The operations performed on this values are prefix in PREFIX (while checking if a prefix is valid) and it has to work for '@%+' and for 'ohv', PREFIX['@'] -> 'o' and PREFIX['o'] -> @ (while converting the prefixes). Probably 2 dicts with keys and values swapped is the best way if noone has a better solution...

twisted-trac commented 16 years ago
Wolf's avatar Wolf commented

This function could also parse the users in the form '@nick', returning the tuple ('@', 'nick).

It could then accept 3 things:

  1. 'nick!realname@host' -> 'nick', 'realname', 'host'
  2. '@nick' -> ('@', 'nick'); 'nick' -> (None, 'nick') or ('', 'nick')
  3. ['@nick1', '%nick2', '+nick3'] -> [('@', 'nick1'), ('%', 'nick2'), ('+', 'nick3')]

It could also raise an error if it receives something like 'nick!foo' or 'nick@bar'.

In order to have 2) and 3) #3285 should be implemented.

twisted-trac commented 16 years ago
Wolf's avatar Wolf commented

I implemented all the options, using a default PREFIX = '@%+'. I also added a new exception (InvalidUser) and a dozen of tests.

I also replaced all the nick = user.split('!', 1)[0] with nick = parseuser(user)[0] (lot of them still used the string.split function instead of the split method).

twisted-trac commented 16 years ago
Wolf's avatar Wolf commented

Instead of the PREFIX constant I used two different constants: USERPREFIX and USERMODES.

They are two dictionaries with keys and values swapped (i.e. {'@':'o', '+':'v'} for the prefixes and {'o':'@', 'v':'+'} for the modes) so it's possible to do if prefix in USERPREFIX: ... and mode = USERPREFIX[prefix] (and viceversa).

Both the constants have a default value, used if the server doesn't provide the list of accepted prefixes. I also wrote the parse_isupport_prefix function to parse the PREFIX value sent by the server (in the form '(ohv)@%+') and this should be used to set their values. There's a new InvalidPrefix exception and 5 tests used by this function.

twisted-trac commented 16 years ago
Wolf's avatar Wolf commented
  • Naming convention This is not a problem, I used parseuser because there is already a function named parsemsg.

  • Tuples kind of suck. Objects with attributes are pretty much always better. Using objects lead to several problems:

    • creating disposable objects just for being able to access the values doing obj.attribute is probably a waste of resources (an answer to NAMES could involve hundreds of users)
    • if we save the objects somewhere we will have to do a list or a dict of all the users and their instances, check if the user already exists before create a new instance, delete it when the user quits, update it when the user changes nick and so on. Moreover, as in the aforementioned case of the NAMES, we may end up with hundreds of instances that we don't need. I don't think that introducing object now is a good idea, even if they could be used in a new version of irc.py that soon or later has to be done.
    • while parsing a user like '@nick' we can't save the '@' in a modes attribute because it is related to a specific channel where the user is operator, so we will have to pass the chan variable to the function, save the value in a dict with {channel: modes, ...} and using something like user.modes[chan] to have back the '@'. I agree that parseuser(user)[0] is quite anonymous, I chose to return a tuple thinking to a situation like nick, realname, host = parseuser(user) but in the code of irc.py usually only the nick is used. Using a dict instead of a tuple will lead to a more readable nick = parseuser(user)['nick'] even if we will need to do user = parseuser(user); nick, realname, host = user['nick'], user['realname'], user['host'] when we want all the 3 values. This is more verbose but more readable too, so it could be a good alternative to tuples.
  • Global variables are bad. I agree with this too and indeed the right place of these constants is in the IRCClient as instance attributes, because every instance could have different prefixes. I don't know if these prefixes are used by other classes but I will check. Moving there these values would mean to make parseuser() and parseISupportPrefix() methods of IRCClient instead of a module-level functions.

About the problem of splitting parseuser() in more function, we need to handle 4 situations:

  1. a user like 'nick!realname@host'
  2. a user like '@nick' or 'nick'
  3. a list of users like 'nick!realname@host'
  4. a list of users like '@nick' or 'nick' Every message returns one of this values (though I don't know any message that returns 3.), and parseuser can detect without problems what it received. The case 1 could return a dict and the case 2 a tuple or a dict. We could then:
  5. leave parseuser as it is, so it can detect what it receives and return the right thing;
  6. doing a parseUser or parseMask function that handles the cases 1 and 3 and a parseNick function that handles 2 and 4;
  7. doing 4 different functions like parseUser or parseMask, parseNick, parseUsersList, parseNicksList but we may end up with have too many similar function that lead to useless complications and the user that does nick = user.split('!')[0] (as actually irc.py does).

Let me know what option you think is better so I can edit the patch.

twisted-trac commented 16 years ago
Wolf's avatar Wolf commented

I'm working on [#3285](https://github.com/twisted/twisted/issues/3285) and I moved there the parseISupportPrefix, I will probably fix this after #3285 will be closed.

twisted-trac commented 15 years ago
jonathanj's avatar @jonathanj commented

I'd say that a function that can parse user prefixes as well as nicknames with a status prefix symbol is just confusing and wrong. They're two completely separate issues. Correctly parsing nicknames with status prefixes also involves ISUPPORT, which is more suited to being a part of IRCClient, as far as the current state of the code is concerned.

Ideally any "event handler" (methods of IRCClient which are called because something happened, e.g. IRCClient.userJoined) should be passed all information about the user (their nickname, ident and hostname), not just the nickname. This behaviour is rather hard to implement while maintaining backward compatibility, and should probably be implemented as part of a new (and possibly better) IRC client protocol, that would be a motivation for using an object as opposed to a tuple.

twisted-trac commented 14 years ago
inhahe's avatar inhahe commented

about userparse, i just happened to put my own version of this into my own copy of irc.py it goes like this:

usersplit = re.compile("(?Pnick>.*?)!(?P<ident>.*?)@(?P<host.*)").match

so if i wanted, for example, the nick and host of "foo!bar@baz", i can just do nick, host = usersplit("foo!bar@baz").group("nick", "host")

this prevents having to use dummy variables if you don't want all three elements, and also does away with the problem with raising exceptions over malformed strings. i dno't know if regex is a little less efficient than two split commands, but it's not like we'll be parsing nicks 10,000 times per second.