python-trio / trio

Trio – a friendly Python library for async concurrency and I/O
https://trio.readthedocs.io
Other
6.17k stars 339 forks source link

Should serve_* take positional *args and pass them on to the handler? #563

Open njsmith opened 6 years ago

njsmith commented 6 years ago

Apparently @dstufft finds it annoying that you can't pass positional arguments through serve_tcp. He has a point! It is weird, when basically every other function like this in trio does allow you to pass-through positional arguments.

There are two issues. First, say we pass handler and *args to serve_tcp. Does this ultimately call:

handler(stream, *args)

or does it call

handler(*args, stream)

? @dstufft suggests that it should call handler(stream, *args), because this allows you to easily write code like:

async def handler(stream, *args):
    pass_arguments_through_to_someone_else(*args)

which is much more awkward with the handler(*args, stream) form. That seems fairly convincing, though I'm still a bit nervous that people will be confused. (Even if one option is better once you think about it a lot, that doesn't mean it's obvious to a new user!)

The other issue is that if we added this right now, we'd have calls like:

await serve_tcp(handler, port, *args)
    → await handler(stream, *args)

await serve_ssl_over_tcp(handler, port, ssl_context, *args)
    → await handler(stream, *args)

await serve_listeners(handler, listeners, *args)
    → await handler(stream, *args)

It's weird how the the final call gets assembled by taking the first argument, and then skipping some arguments, and then taking the rest.

Here @dstufft also points out that we could make these kwargs, so it'd be:

await serve_tcp(handler, *args, port=port)
await serve_ssl_over_tcp(handler, *args, port=port, ssl_context=ssl_context)
await serve_listeners(handler, *args, listeners=listeners)

This is... pretty compelling actually, even on its own merit. Trio's general rule is that in runner(callable, *args, **kwargs), the positional args belong to callable, and the kwargs belong to runner. Moving serve_*'s configuration arguments to kwargs would make them much more consistent with the general rule. Plus writing serve_tcp(handler, 80) has always kind of bothered me (what's that random magic constant?). But serve_tcp(handler, port=80) looks quite nice. I'm actually not sure why I didn't do it this way in the first place. Maybe because I'm still stuck in the past and forgot we can actually use mandatory kwargs now?

If we do implement both of these changes then it'd be a kind of disruptive transition, where every serve_tcp call needs to convert from serve_tcp(handler, port)serve_tcp(handler, port=port). But I do think we can do a regular deprecation warning, so it's not wildly out of line with other API adjustments we've made.

So I'm... tentatively kind of thinking this might be a good idea? Not entirely sure yet. What do y'all think?

dstufft commented 6 years ago

An additional benefit here is that if people switch around from serve_tcp, to serve_ssl_over_tcp, to serve_listeners, I imagine it's not uncommon to accidentally send the wrong parameters into the wrong function. Presumably when that happens now, you either get a too many/too few parameter exception or I'm assuming duck typing takes over and you get some weird error when you try to use an integer as a listener or something like that.

However, with mandatory kwargs, you get much better error messages, something like: TypeError: serve_listeners() got an unexpected keyword argument 'port' instead. That seems like an additional win to me.

smurfix commented 6 years ago

I agree that this change would be beneficial. … which reminds me that I should build a new release …

sorcio commented 6 years ago

But I do think we can do a regular deprecation warning

By this, do you mean this?

1) first, make a release that issues a DeprecationWarning when serve_* are called without kwargs 2) at some point, make a release with only mandatory kwargs 3) only after that point add positional args

Or do you want to add positional args straight away, with the positional confusion, and in the same release document serve_* other args to be keyword only and issue DeprecationWarnings?

njsmith commented 6 years ago

The ideal deprecation cycle is: there should be some period of time where the old thing works the same way it used to, and the new thing works the way it will work in the future, and the old thing gives a warning.

So in this case, we'd want to have some time where serve_tcp(handler, port) works the same way it does now while issuing a warning, and serve_tcp(handler, port=port) also works, but doesn't issue a warning. Implementing this is a little awkward, but definitely doable.

I guess we could add positional args straight away, with the proviso that they only work if port= is also passed. So serve_tcp(handler, blah, port=80) means serve on port 80 and then call handler(stream, blah), but serve_tcp(handler, blah) issues a warning and then does serve_tcp(handler, port=blah). Something like:

_not_given = object()
async def serve_tcp(handler, *args, *, port=_not_given, ...):
    if port is _not_given:
        if len(args) != 1:
            raise TypeError
        warn_deprecated(...)
        (port,) = args
        args = ()
    # Now we can act like we're in the future case
    ...
sorcio commented 6 years ago

Thanks for the clarification! Makes perfect sense to me. I support the change.

Any other part of the API that still doesn’t use the same pattern? I don’t remember any.

njsmith commented 6 years ago

Here's a mild annoyance that happens if we make port and friends become mandatory kwargs instead of positional args – it becomes impossible to start a background server without using partial:

# Before
await nursery.start(serve_tcp, handler, port)

# After
await nursery.start(partial(serve_tcp, handler, port=port))

This isn't a huge showstopper or anything, but I missed it before, and it is a bit annoying given how commonly these two operations are combined. And that the original reason @dstufft suggested making this change was that he wanted to avoid using partial :-). In fact for those actually using the new *args support, I guess the relevant comparison might be:

# Before
await nursery.start(serve_tcp, partial(handler, *handler_args), port)

# After
await nursery.start(partial(serve_tcp, handler, *handler_args, port=port))
Zac-HD commented 6 years ago

FWIW I really really like keyword-only arguments, and would be happy to see them basically anywhere in Trio. They're harder to accidentally misuse, easier to compatibly change later, and clearer when reading calling code! If it wasn't for the whole "invalid on Python 2" thing, I'd use them for practically everything - IMO two positional arguments is a sensible limit 😄

I've also argued for more use of partial(...) before, so that doesn't bother me and I find the after version more elegant and explicit in the second case.

njsmith commented 5 years ago

Looking at this again, I noticed another piece of potential awkwardness: if we make it so serve_tcp(handler, *args) ends up calling

await handler(stream, *args)

...then that means that serve_tcp(handler, *args) will have a different meaning than serve_tcp(partial(handler, *args)), because the latter will eventually run

await handler(*args, stream)

OTOH, @dstufft's point above is still pretty compelling – that if you have a wrapper, you can write async def handler_wrapper(stream, *args), but async def handler_wrapper(*args, stream) does not do what you want.

Options to consider:

Zac-HD commented 5 years ago

I remain very fond of partial - it always works, doesn't break abstractions, and is back-, forward-, and sideways-compatible.

mentalisttraceur commented 4 years ago

Couple of thoughts:

  1. I want to do foo(port) or foo(port=80). I prefer not to do foo(port=port). I would probably push back against foo(80) in a code review.

    When it's a variable named something clearly self-descriptive like port, passing it positionally is on balance a little better - when it is a number, passing it keywordally is much better.

    So if I had to sacrifice one to enforce the other, I would side with keyword-only.

    But in practice, almost all code I've ever written that dealt with ports ends up with the port in a variable named port or {{ something }}_port. The only exceptions are off-the-cuff scripting situations and scripts so tiny that they just use argv values directly without running them through argparse (or docopt, which is much nicer to use but not in the standard library).

    So arguably making it keyword-only would make it both worse in the common case where the code is going to be read again, and worse in the common case where the code is not going to be read again.

  2. Don't worry about "inconsistent with partial", that's not as bad as it seems.

    It's worth thinking about when you actually want to call a handler with a signature stream, *args or *args, stream. Usually your handler will have a signature stream, foo, bar or foo, bar, stream - the order is not a big deal in that case.

    So my intuition is that most cases actually end up in code where all arguments can be individually pulled apart by name, and that the overwhelming majority of cases can be written that way.

  3. About this:

    # Before
    await nursery.start(serve_tcp, partial(handler, *handler_args), port)
    
    # After
    await nursery.start(partial(serve_tcp, handler, *handler_args, port=port))

    Both of these take me enough mental effort to parse from a cold read (not warmed up to the patterns common in Trio or these functions specifically) that my reflex would be to push back against either.

    Here's what I would want to do instead:

    handler_ = partial(handler, *handler_args)
    serve_tcp_ = partial(serve_tcp, handler_, port)
    await nursery.start(serve_tcp_)

    Yeah, it's more annoying to write. But it's very lightweight to read!

    Yeah it is visually more stuff. But it is simpler stuff that requires less mental load and precise visual work to rigorously parse. Yeah it might feel more complex in that you have to parse and combine two partials in your mind. But that complexity is objectively there and we need to mentally drag it out of the other forms too to rigorously step through what exactly the code does in our mind. This form just forces it to be explicit.

  4. There is a composable way that people can use to skip writing partial at every call site!

    def handler(...):
        ...
    handler = partial(partial, handler)
    
    # or to name the pattern:
    def partialize(function):
        return partial(partial, function)
    
    # then you can use it as a decorator too
    @partialize
    def handler(...):
        ...

    Then your example use case can look like this:

    await nursery.start(serve_tcp, handler(*handler_args), port))

    You can also use curry from a library like functools or funcy, but this "partialize" thing does not require any dependencies, is a one/two -liner, partializing does not struggle with variadic functions like currying does, and does not require people to wrap their heads around currying.

mentalisttraceur commented 4 years ago

Oh I totally missed replying to this one:

  • Pass the stream first, args second, AND if the fn is a functools.partial object, then inject the stream at the front of the saved args: pretty abstraction-breaking

Understatement. You probably already see what I am about to say, but for anyone who doesn't:

This would break partial in a profound way, completely violating what anyone used to partial application would expect.

I literally would never have thought to worry about a function doing this to me if I passed a partially applied function in, until I read this. Now I won't be able to sleep at night. "Nothing is certain... Nothing can be trusted..." I mutter as I sit curled up in the corner, rocking back and forth, terror in my wide and unblinking eyes.

If I ever hit a function like this in the wild I would immediately code my own implementation of partial just so that I could always know no other code could be written to detect and modify my arguments out from under me.

This would be like if we overloaded a substraction operator so that a - b actually computed b - a!

By the way consider:

  1. reverse/from-the-right partial application (see funcy.rpartial; in other languages I have also usually seen it called partialRight or the like), so the caller can just do:

    handler_ = rpartial(handler, *handler_args)
    serve_tcp_ = partial(serve_tcp, handler_, port)
    await nursery.start(serve_tcp_)
  2. composable ways of reversing/flipping the signature of a callable such as:

    def flip(function):
        def flipped(*args, **kwargs):
            function(*reversed(args), **kwargs)
        return flipped
    
    handler_ = partial(flip(handler), *handler_args)
    serve_tcp_ = partial(serve_tcp, handler_, port)
    await nursery.start(serve_tcp_)

Notice the elegant separation of concerns: you just worry about calling the callable with exactly the arguments you want to pass, in the positions/keywords it makes sense to pass them in, your caller worries about how their functions actually get those arguments, and a few simple composable functions take care of doing just that one job well.

TL;DR: Modifying the order of passed in partial objects' arguments harms the very benefits that partial brings.

mentalisttraceur commented 4 years ago

Wait I just realized! You can just write a generic wrapper that lets you give all ~Trio~ functions basically this behavior!

def with_args_to_handler(function):
    def wrapper(handler, *args, **kwargs):
        return function(partial(handler, *args), **kwargs)
    return wrapper

Then you can just do:

serve_tcp = with_args_to_handler(trio.serve_tcp)

And then call it as you wanted:

serve_tcp(handler, foo, bar, port=80)

You can change the with_args_to_handler wrapper to use an rpartial implementation if you want the *args to go in the back, but really I don't think it actually matters... So what if your signature is def handler(foo, bar, stream) instead of def handler(stream, foo, bar)? And I don't think it is ever necessary or even ever best to have a handler with the signature def handler(stream, *args).

mentalisttraceur commented 4 years ago

Taking that last idea further: you could even provide such wrappers with Trio.

You could even provide a file within Trio that people could import, which has all the Trio functions wrapped like this.

I mean it needs a little more work for class methods or whatever, but you could generalize this to the point that Trio's code never needs to pass through *args or **kwargs, and yet that behavior is within reach for those who want it.

Then instead of a bunch of case-by-case design decisions, having to try to balance the wants of people who want to pass through *args and people who want to pass through **kwargs and people who want to not type partial, and the need to have arguments that you don't pass through, and how to order them, you can just have a couple of wrappers that implement a few of these generic, sensible variants. And optionally provide them out of the box, or just document them in a "recipes" section somewhere.

Even if you leave places that already follow the "*args for function, **kwargs for our options" as-is because of convenience or inertia, it's a general strategy you can start using to solve this problem elsewhere.

And you can use the same strategy to solve the "but I wanna pass **kwargs without typing partial!" too:

def with_kwargs_to_handler(function):
    def wrapper(handler, *args, **kwargs):
        return function(partial(handler, **kwargs), *args)
    return wrapper

def with_args_and_kwargs_to_handler(function):
    def wrapper(handler, *args, **kwargs):
        return function(partial(handler, *args, **kwargs))
    return wrapper

But probably give them nicer shorter names, like pass_args, pass_kwargs, pass_args_kwargs.

Edit: Note that you can't do these wrappers as generically and composably in the other direction! For every *arg you have to convert into a **kwarg and every **kwarg you have to convert into an *arg you have to specify the arguments by name. But if the base function just takes a callable and just the arguments it does not pass through, then these wrappers can be perfectly general.

mentalisttraceur commented 4 years ago

Now a concise final reply to the main question: I slightly recommend against, because

  1. Accepting port as a positional argument reduces stuttering when the port is in a properly named variable, which it often ought to be.

  2. p = functools.partial is a thing people can do to reduce typing in cases where it matters enough.

  3. Partializing or currying is trivial to do in Python and also eliminates partial at the call sites.

  4. Separate explicit partial applications scale better in most respects other than writing them.

  5. You can just write a generic wrapper that gives all Trio functions this behavior composably.