jaraco / irc

Full-featured Python IRC library for Python.
MIT License
390 stars 84 forks source link

Asyncio Client improvements #146

Open zinsserzh opened 6 years ago

zinsserzh commented 6 years ago

After thoroughly examining the module, I found multiple improvements could/should be done to asynchronous code. I'd like to open a discussion before actually implementing the code because some changes will break the current API.

def __init__(self, on_connect=__do_nothing, on_disconnect=__do_nothing, *, loop=None)

jaraco commented 6 years ago

Those all seem like reasonable ideas. The AsyncIO support is fairly new, so as long as @MattBroach is comfortable with the changes, I'm happy to accept them (even if backward-breaking).

zinsserzh commented 6 years ago

Sounds great. I'll go ahead and start the implementation. My own project needs it anyway :D

zinsserzh commented 6 years ago

Do you mind if I also clean some code for the non-async part along the way if I happen to find any? I won't make big changes. If so I'd make a new PR. E.g. in connection.py, identity function could be replaced by lambda x: x.

MattBroach commented 6 years ago

Just popping in to say this all sounds great to me. As @jaraco sad, the asyncio stuff was just added, so now is the best time for breaking changes. Thanks for the thorough review.

zinsserzh commented 6 years ago

Guys, I start to feel that it's hard to maintain two versions in a single module. A lot of non-async code depends on raw sockets where async code uses protocol and transport extensively. Plus that non-async code needs to support both py2 and py3, which makes it harder to separate pieces and bits to implement py3-only code. I think it's easier to work on a fork and make another package that only supports async irc code and release separately. What do you think? I am willing to maintain the fork.

jaraco commented 6 years ago

Do you mind if I also clean some code for the non-async part along the way if I happen to find any?

In general, this sort of contribution should be handled as a separate PR to be accepted first. You can still implement your asyncio improvements on top of these incidental changes, but then they'll be dependent on the acceptance of those changes and might require resolving of merge conflicts if not accepted exactly as presented. But this project is happy to accept all sorts of changes as long as the concerns can be kept separate. If the changes are small and non-controversial, it's fine to implement those in the same PR, just use separate commits (same guidance, just more lightweight).

E.g. in connection.py, identity function could be replaced by lambda x: x.

You'll find the linter won't accept identity = lambda x: x (although that would have been my stylistic preference as well). You could move the change inline (i.e. wrapper=lambda x: x), but that conveys less information.

Honestly, if I were you, I'd try to look past these stylistic changes and focus on the semantic or structural changes. But again, happy to review any proposed change.

I think it's easier to work on a fork and make another package that only supports async irc code and release separately. What do you think?

A fork, especially in a different package/module, is just asking for trouble. It means creating two lines of development on a project that already has very little development.

I do see the appeal - that maintaining simple socket compatibility is hard - hence why we have something of a fork already with two implementations. I'd rather not increase the divergence with yet another package. It seems reasonable at first until we get the next bug report that affects the common parts of the two libraries - then we have to expend extra effort to ensure that both implementations get that functionality (or otherwise manage the increasing cognitive distance between the two projects).

Ideally, we could refactor the code in such a way that any event loop / IO handler could be chosen, and the rest of the functionality lies atop that, although I suspect that's easier said than done (if possible).

If a proper loop abstraction isn't possible or achievable, I'd be more inclined to drop support for Python 2 and a select loop, and rely entirely on async IO and Python 3 for the library (a backward-incompatible release).

So in that sense, I do support a fork, one in which the select-based functionality is left behind in a maintenance branch that's unlikely to get backports of functionality from the master. If you can bring the asyncio functionality to feature parity with the select loop, then I'm happy to accept that as the sole event loop for the library.

To the extent that you can, try to implement the changes with many minimal steps with stable commits.

Thanks!

zinsserzh commented 6 years ago

How about this, I'll put all async code in a subfolder, i.e. irc/aio/ which has a similar directory structure as irc and start to refactor the code, separate shared code into base classes (BaseServerConnection, etc). For things that are hard to split, I'll copy some code from irc to irc/aio/ and modify it.

Btw some of the dependencies (jaraco/*) does not support async code as well (e.g. Throttler). How do you want me to deal with it? Add more things to your modules (which appear not to be open source) or implement it in another file in this particular repo?

Btw, I do think wrapper=lambda x: x makes things super clear as it tells the reader wrapper is supposed to be a function/method/closure and its default value is returning the socket untouched (with that in mind maybe wrapper=lambda sock: sock is even better. wrapper=identity requires me to check what is identity and what it does by searching the name in the file.

jaraco commented 6 years ago

I'll put all async code in a subfolder

That seems reasonable. I'd especially like it if there were symmetry with another package that served for non-async mode. For example, there's irc.aio and irc.select and each implements equivalent functionality. But that's not a prerequisite.

some dependencies do not support async code

Ugh. Well, I guess that's an unfortunate situation, but not surprising. Probably best would be to make those features compatible with async as well. I'd be happy to accept those improvements as well. I would like them to be contributed back to the canonical project, all of which are open source (if they're not, that's a bug).

I see Throttler is imported from jaraco.functools, found at GitHub.

If you wish to test with unreleased versions of jaraco.functools, it's easy to:

irc $ .tox/python/bin/pip install -e ../jaraco.functools

Then run tests or exercise the IRC library against the in-development version of jaraco.functools before committing/pushing/pull-requesting.

Btw, I do think wrapper=lambda x: x makes things super clear... wrapper=lambda sock: sock is even better.

Yes, I agree. I think I was planning to move identity to another library (as I've used that pattern in more than a dozen places). It's the kind of thing that I feel should be in the stdlib, like None only for functions. But since it's not, it has to be explicit throughout the various implementations.

That said, there is a tension with lambda. Many Python developers, especially those unfamiliar with functional programming paradigms, find lambda objectionable. I'm perfectly comfortable with it and my sensibilities align with yours here. Just be aware that others may come along later and wish to change it (back).

In any case, sounds like you're on the right track.

jaraco commented 5 years ago

Since this discussion began, this project has dropped support for Python 2 and the async implementation has expanded, so progress is being made.