trio.run() should take **kwargs in addition to *args

python-trio / trio

Trio – a friendly Python library for async concurrency and I/O

https://trio.readthedocs.io

Other

6.06k stars 330 forks source link

trio.run() should take **kwargs in addition to *args #470

Open kennethreitz opened 6 years ago

kennethreitz commented 6 years ago

this would greatly enhance usability — especially for function declarations like:

async def request(method, url, *, pool=None, preload_content=False):

njsmith commented 6 years ago

Hmm, this has come up a few times, and it's a totally reasonable question – I think we need some sort of FAQ entry for this... maybe in the tutorial? Will think about it.

Anyway, here's the problem: a function like run has to take the thing to run, and it also has to take its own arguments to configure how it runs it. So we need some way to distinguish which arguments are for which, but all the possible solutions involve some kind of awkwardness.

BTW, in ordinary usage you wouldn't use trio.run to call request, that wouldn't make much sense – you normally call trio.run once at the very top of your program, so it's usually something like trio.run(main). (I guess you're looking at making synchronous shim wrappers or something?) But there are a bunch of functions in trio that have this same problem: nursery.start_soon, run_sync_in_worker_thread, etc. etc., so we do need a general solution.

What we do is recommend people do from functools import partial and then trio.run(partial(request, method, url, pool=my_pool)) or whatever. I'm not a huge fan, but all the alternatives seem worse. Some advantages of this are: (a) it works uniformly for async and sync callables, (b) it's ordinary standard Python, so once you learn it you can use it everywhere, with or without trio, (c) partial objects support introspection, so e.g. if you start a task like this and then later use trio's debugging interfaces to print out the task tree, then you'll see that trio has figured out that this task is running a function named request, not one named partial or <lambda> or something, (d) it allows extra arguments to be added, which is kind of a niche thing but important for nursery.start (note: different from nursery.start_soon), (e) the syntax really isn't that bad, and in particular supports all of Python's normal funcall syntax like partial(request, *args, extra_arg, something=whatever, **my_kwargs). The left edge looks a little funny, but otherwise it's using function call syntax to represent a function call.

Rejected alternatives include:

YOLO it up: run takes its arguments, any unrecognized arguments are passed on to the called function. But if we did this, then we could never add any new kwargs to run (or similar functions) without breaking backcompat, which isn't workable in a foundational library like trio that needs to (eventually) be super careful about backcompat.
Reserve underscored kwarg names as being for trio.run, and the rest get passed on to the function being called: this is kinda gross, and doesn't really solve the problem, because there's nothing that says an arbitrary function can't take kwargs that start with underscores. In particular, by implementing this we would be defining a dozen such functions! And some of them you might want to pass to each other in unusual-but-they-happen situations.
Use a special argument for arguments to run, like def run(fn, *args, **kwargs, run_args={}): this is super awkward to use and document (the signature no longer tells you what arguments you can pass in run_args), plus still has the issue of potential kwarg name collisions (what if fn has a run_args argument).
Use a special arguments for kwargs for the function, like def run(fn, *args, *, fn_kwargs={}, ...): this is uglier to use and read than partial (since now you have to write kwargs using dict syntax, while partial uses call syntax), and it's still an idiosyncratic special trick we'd be forcing people to learn. If we're going to make people learn an idiosyncratic trick anyway, partial is much more generally useful.
Allow raw coroutine objects, like def run(fn(*args, **kwargs), ...): it doesn't solve the problem for variants that take sync functions like run_sync_in_worker_thread, it doesn't work for more sophisticated nursery-like objects that need to control how they run the function passed to nursery.start_soon (think: Erlang-style supervisors that can restart services after a crash), and it would make it impossible for us to ever fix the super-annoying wart that new users always run into where they forget a mandatory await and Python doesn't give them a proper error.

If you've got a better idea definitely do let me know but... what we do now really is the best option available, AFAICT :-).

cjerdonek commented 6 years ago

a function like run has to take the thing to run, and it also has to take its own arguments to configure how it runs it.

Here's another option I don't see listed. What about not supporting passing configuration options directly to run() at all? Instead, if non-default behavior is desired, the caller can instantiate, say, a TrioRunner object with those options and use runner.run(). It looks like you're almost doing that here already anyways, but with a Runner class that has a slightly different responsibility, and perhaps not exposed as part of the public API.

There's something to be said for a clean function signature that doesn't mix arguments for async_fn with arguments for run. It also provides more options for configuring run() and may age better, too, because you won't have to add more and more kwargs over time. (One of them is already called restrict_keyboard_interrupt_to_checkpoints :), which might be better as an attribute.)

njsmith commented 6 years ago

What about not supporting passing configuration options directly to run() at all? Instead, if non-default behavior is desired, the caller can instantiate, say, a TrioRunner object with those options and use runner.run().

Interesting idea! And for trio.run alone, this would probably work. But, the same problem happens for:

trio.run
nursery.start_soon
nursery.start
trio.run_sync_in_worker_thread
trio.BlockingTrioPortal.run
trio.BlockingTrioPortal.run_sync
trio.hazmat.spawn_system_task
trio.hazmat.TrioToken.run_sync_soon

And then trio-asyncio has the same problem for its functions to call trio-from-asyncio and vice-versa, and I'm sure this will keep cropping up around the ecosystem.

Adding a configobject interface for all of these seems really cumbersome? And I'd rather have just one solution to this problem that gets used everywhere, so you only have to learn it once, rather than two different ones so then you have to learn both solutions, and which solution is used where.

cjerdonek commented 6 years ago

Okay, good to know. There's yet another hybrid pattern / approach you could use for all of these. Namely, rather than calling, say, runner.run() and putting the config options in the runner object, you could have a single options argument that houses all of the config options and optionally pass that to run().

More generally, using this pattern means that you would have at most one reserved keyword argument name per function. The argument name could potentially be the same across all of the functions, though the class could vary. When you want to add more options to some function, you could add to the appropriate class without needing to touch any of the function signatures. Maybe you could even do something clever with an optional positional argument, so you wouldn't need to choose and reserve a name.

njsmith commented 6 years ago

Ah, yeah, that's similar to the run_args={} idea I mentioned above, except using a class instead of a dict to pass the options. That has the advantage that the class could use funcall syntax to specify the options (options=trio.RunOptions(a=1, b=2) vs. options={"a": 1, "b": 2}). You do have to define the class (though maybe just using a single Options class that just saves its kwargs would be enough for everyone?), and with one reserved kwarg you still have the possibility for collisions.

Maybe you could even do something clever with an optional positional argument, so you wouldn't need to choose and reserve a name.

That's true, you could do something like:

def run(*args, **kwargs):
   # Real code would need more fiddly error-checking but this should give the idea
    if isinstance(args[0], Options):
        options = args.pop(0)
    fn = args.pop(0)
    ...

...and then it'd be used like:

run(fn, foo, bar)
# or
run(Options(clock=whatever), fn, foo, bar)

This feels very surprising though in the Python context.

Again, these can work but... the only real downside to partial is that it forces people to learn one quirky convention, but everything else does that too.

(FWIW, I stole the partial convention from asyncio, so I think that means it also has the Guido Seal Of Approval ;-).)

Maybe the tutorial should just get a short discussion of partial + a link to this thread for those who are curious about the details.

cjerdonek commented 6 years ago

Thanks for summarizing. You stated it well. Yeah, I'm familiar with the functools.partial approach required by asyncio. But it always feels unsatisfying and never natural. For example, each time you're faced with the decision of whether to pass the *args in the partial() call, or pass only the **kwargs and leave the *args for run(). The latter spreads the arguments across more than one invocation, which isn't ideal for code comprehension. And if the former should be used, then why not insist that partial() always be used? That way there'd be only "one way." (Actually, some parts of asyncio's API do require partial, like Future.add_done_callback().)

That has the advantage that the class could use funcall syntax to specify the options (options=trio.RunOptions(a=1, b=2) vs. options={"a": 1, "b": 2}).

It might be worth pointing out that you can also use funcall syntax for dicts, e.g. options=dict(a=1, b=2), though it wouldn't do argument name checking like a class would provide. I often do this for cases where the dict value will wind up as kwargs for some function.

Incidentally, if you'll indulge me, I can't help mentioning a vague, not-fully-formed idea I've had for some time, which is for Python to expose an object that could be used to encapsulate *args and **kwargs in a single object (e.g. pkargs = Args(*args, **kwargs)). It would be some marriage of a tuple and a dict and allow passing *args and **kwargs to a function using a single value (e.g. func(***pkargs) or a similar syntax). Do you know if something like this has ever been proposed?

njsmith commented 6 years ago

For example, each time you're faced with the decision of whether to pass the *args in the partial() call, or pass only the **kwargs and leave the *args for run(). The latter spreads the arguments across more than one invocation, which isn't ideal for code comprehension.

I guess my convention is to always use either run(fn, arg1, arg2, run_arg=...) or run(partial(fn, arg1, arg2, kwarg=something), run_arg=...). I agree that run(partial(fn, kwarg=something), arg1, arg2) would be very weird style.

And if the former should be used, then why not insist that partial() always be used? That way there'd be only "one way."

Yeah, it really comes down to taste. On balance I think the advantages of being able to write run(fn, arg1, arg2) are enough to make it worthwhile, but if you'd rather write run(partial(fn, arg1, arg2)) then go for it I guess :-).

(Actually, some parts of asyncio's API do require partial, like Future.add_done_callback().)

¯\_(ツ)_/¯

It would be some marriage of a tuple and a dict and allow passing *args and **kwargs to a function using a single value (e.g. func(***pkargs) or a similar syntax). Do you know if something like this has ever been proposed?

Huh, no, I haven't seen that idea before.

buhman commented 6 years ago

Personally I'd like to see python add syntax-level support for partial application, then the equivalent of run(partial(fn, arg1, arg2)) is always what happens.

njsmith commented 6 years ago

@buhman if you can convince python-dev of that then we can certainly revisit this :-).

Zac-HD commented 6 years ago

Duplicate of #88, similar to python-attrs/attrs#22.

(FWIW I'm very strongly in favor of partial instead of adding **kwargs)

sscherfke commented 6 years ago

Aliasing partial (e.g., as P) can reduce the pain and visual clutter:

from functools import partial as P

trio.run(P(func, a, b, c=42))

njsmith commented 6 years ago

I have suggested, half in jest, that trio should add trio.withargs or something as an alias for functools.partial, to make it feel less weird and scary.

It might actually be worth considering seriously.

njsmith commented 5 years ago

There was a substantial discussion/brainstorming session about this in chat yesterday, starting around here: https://gitter.im/python-trio/general?at=5c19957cb8760c21bbed7ee9

I think there are 3 reasons there's ongoing tension here: (1) intuitively it seems like people want to pass through kwargs to their function a lot more than they want to use the relatively obscure configuration arguments like start_soon(..., name="myname") or run_sync_in_worker_thread(..., cancellable=True) so it feels like passing through kwargs should be the shorter/easier to read version, and it isn't. This is something we could check by looking at real code. (2) this creates pressure to prefer positional arguments in new APIs, which on the margin means sacrificing other considerations, like readability, (3) partial is jargony and obscure and requires an extra import, which are bad things in a common operation.

Some ideas from the thread:

make nursery.start_soon(fn, args(posarg, kw=whatever)) the One Way to call things while passing args.
nursery.start_soon[options(name="thing")](fn, posarg, kw=whatever). Seems weird and syntax is similar to how PEP 484 does generic constructors (e.g. l = List[int]()).
nursery.start_soon.add_options(name="thing")(fn, posarg, kw=whatever)
nursery.start_soon(fn, posarg, kw=whatever, start_soon_name="thing") (or trio_name or something, maybe, though then we'd lose the ability to error out if someone passes an unrecognized kwarg from the future)
nursery.with_options(name="thing").start_soon(fn, posarg, kw=whatever)
split into nursery.start_soon and nursery.start_soon_configured where the former only accepts arguments to fn, and the latter looks like what we have now.

Occasionally there are rumblings about adding some kind of macro/quoting syntax to Python. If that happens I guess we could have nursery.start_soon!(fn(posarg, kw=whatever), name="thing"), which would look lovely. But it would require stack introspection on every call, which might be too expensive for this.

None of these strike me as obviously superior to what we're doing now, but there are some new ideas I wanted to capture.

Zac-HD commented 5 years ago

I kinda like nursery.with_options(...).start_soon, actually.

njsmith commented 5 years ago

More brainstorming in chat: https://gitter.im/python-trio/general?at=5c4a4449f780a1521f638f59

dhirschfeld commented 5 years ago

Put my 2¢ in @ https://gist.github.com/njsmith/ad4fc82578239646ccdf986ae3ca07c1

Curious if anyone else thinks that's a good api? s/config/options/with_options/add_options depending on taste.

smurfix commented 5 years ago

Another idea, see https://gitter.im/python-trio/general?at=5c4b3e3e20b78635b67fa78e

Basically this would allow you to call .start_soon(fn, *args, **kwargs) or .start_soon(**start_soon_options)(fn, *args, **kwargs), which is syntactically as minimal as you can get

dhirschfeld commented 5 years ago

It's certainly possible to switch behaviour on whether or not fn is passed in but that's probably too much black-magic for my tastes.

The verbosity of options/config/with_options does grate a little but it has the benefit of being pretty explicit and obvious.

dhirschfeld commented 5 years ago

Another api I was considering was forcing users to use a method to call the func, thereby leaving the __call__ free to configure the options - e.g.

class takes_callable(object):
    def __init__(self, wrapped_fn, **options):
        self.wrapped_fn = wrapped_fn
        self.options = options

    def call(self, fn, *args, **kwargs):
        return self.wrapped_fn(
            partial(fn, *args, **kwargs),
            **self.options
        )

    def __call__(self, **options):
        return takes_callable(self.wrapped_fn, **options)

In [41]: @takes_callable
    ...: def run(fn, **options):
    ...:     print(f"options = {options!r}")
    ...:     return fn()

In [42]: def func(*args, **kwargs):
    ...:     print(args)
    ...:     print(kwargs)

In [43]: run.call(func, 1, 2, q=3)
options = {}
(1, 2)
{'q': 3}

In [44]: run(clock=None).call(func, 1, 2, q=3)
options = {'clock': None}
(1, 2)
{'q': 3}

I think this might be a cleaner api but also less obvious/discoverable

himtronics commented 5 years ago

IMO the run*()/start*()/spawn*() methods all need to transform a function call (function + parameters) into a task that will be run/started/spawned sooner or later with possibly some additional options on how to do that.

Splitting the spec of the function call over multiple parameters of the run*()/start*()/spawn*() methods is not a good idea as this is something belonging tightly together and that is what partial(fn, *args, **kw) does, storing the parameters in a single object though the name of the method is sub optimal for trio's purpose.

As trio does not necessarily need the __call__ functionality it could just use partial() for storing the info and take fn, *args and **kw from that object when creating the underlying Task() so that partial() is not on the call stack.

trio could import partial under a different name (or declare a class that just stores its arguments in its __init__(), unfortunately Task() is already taken and FuncCall() a bit long, perhaps FC?,

E.g. start_soon(async_fc, name=None) with async_fc = FC(async_fn, *args, **kw) would allow to extend the run/start/spawn* methods by other positional or keyword parameters.

njsmith commented 5 years ago

Doing some paid work tonight, I'm being annoyed again by having to use functools.partial every time I call run_sync_in_worker_thread.

I think at this point we can be fairly confident we've exhausted all the different combinations of .with_options and variants, and they're all pretty awful.

I'm thinking about revisiting one of the ideas we rejected early on: of using underscores to mark configuration-kwargs, versus no-underscore to mark passthrough-kwargs. So example usage would be:

await trio.run(main, _clock=MyClock)
nursery.start_soon(async_fn, arg1, arg2, kwarg=foo, otherkwarg=bar, _name="something")
await trio.run_sync_in_worker_thread(container.logs, stdout=True, stderr=False, _limiter=my_limiter)
await serve_tcp(handler, _port=80)
await serve_ssl_over_tcp(handler, _port=443, _ssl_context=my_context)

One downside is that it's weird-looking. But I guess we'd get used to it? I think serve_tcp and serve_ssl_over_tcp are the only two functions where we want to pass both an arbitrary callable + mandatory kwargs (see #563), so they're the most affected. It does seem easier to explain and to use than all the .with_options(...) ideas proposed above. I'm not super-excited about the looks, but I can imagine we might decide that using partial all the time is annoying enough to overcome this downside.

The more technical potential downside is lack of compositionality/universality: what if we want to pass through a kwarg that starts with an underscore? For example, nursery.start_soon(trio.run_sync_in_worker_thread, sync_fn, _cancellable=True, _name=sync_fn) – notice that _cancellable is supposed to go to run_sync_in_worker_thread, while _name is supposed to go to start_soon. I think our full set of options are:

Pass through unrecognized kwargs, even if they start with underscore: this is definitely not workable, because it would mean that if we ever added any new kwargs to these functions it would be a breaking change.
Use some sort of quoting rule: for example, any kwarg with double-underscores gets one underscore stripped off, and then is passed through. So our example becomes:
```
nursery.start_soon(
  trio.run_sync_in_worker_thread, sync_fn, __cancellable=True, _name=sync_fn,
)
```
Or, slightly more complicated but maybe nicer to look at, we could look at the kwargs, and of them, all the ones with the maximum number of underscores get taken by the top-level fn, and the rest get passed through. So our example would become:
```
nursery.start_soon(
  trio.run_sync_in_worker_thread, sync_fn, _cancellable=True, __name=sync_fn,
)
```
Document that if they really want to pass-through underscored kwargs, they should use partial:
```
nursery.start_soon(
  partial(trio.run_sync_in_worker_thread, sync_fn, _cancellable=True),
  _name=sync_fn)
)
```
Given how rarely this situation arises, handling it with partial doesn't seem like a huge deal. Also, if we started with this option, then we'd have the option to later extend it to one of the quoting options above, without breaking backcompat. So if we're going to do this at all, then this seems like the variant to start with.

The implementation would be extremely simple. E.g.:

async def run(async_fn, *args, **kwargs, _clock=None, _instruments=()):
    check_no_underscores(kwargs)  # raises an error if any keys start with _
    ...

No magic decorators, so it's totally understandable by naive readers, sphinx, mypy, and pylint. The one wrinkle is that mypy/pylint wouldn't know to flag trio.run(..., _clokc=...) as an error, but that doesn't seem like a huge issue. (And if we really want to then we could fix it with plugins. It seems like we'll have mypy/pylint plugins in any case, to handle things like trio.open_file and missing-await detection – see #671 – so adding this feature wouldn't be a huge extra cost.)

njsmith commented 5 years ago

Oh, wait, there's a second kind of compositionality problem. There are places where trio generates a kwarg: in particular, nursery.start automagically passing task_status=.... And I guess serve functions might end up passing through the stream as a kwarg too (#563), like handler(peer=stream_obj). I think these are the only two places where this comes up.

But, it's not a theoretical problem: the serve functions both take both a callable and also support task_status. So if we say that all their kwargs have to start with _, AND ALSO say that they have to take a kwarg called task_status, then that's a problem!

What options do we have available?

The serve functions could go ahead and take task_status, even though it's not underscored. It would be a bit awkward to explain why we were breaking our own rule, but not necessarily a big deal? task_status is already effectively a reserved kwarg name in trio; it doesn't really make sense to have an incoming connection handler that takes task_status.
We could switch the start protocol to use _task_status as the name of its magic kwarg. This would be an awkward transition b/c there's no good way to deprecate the old way. (I guess our options would be (a) rename start, (b) do a two-round deprecation where we force everyone to add a new_style=True switch and then force them to remove it again.) But eh, given that it's effectively a magic reserved name anyway, the underscore wouldn't look that weird. This would help the serve functions. Are there any cases that go the other way, where you want to pass a function-taking-a-function to start, and have task_status passed through? Of course, we don't support that now either...

Are we worried about similar issues for the peer=/client=/whatever-we-call it arg from #563? I'm having trouble thinking of any situation where you'd want to pass a function-that-takes-a-callable to serve. Maybe run_sync_in_worker_thread, but then you'd want the peer= to be passed through. Probably someone can come up with something, though?

I don't feel like I've fully wrapped my head around the issues here.

smurfix commented 5 years ago

Meh. I still think that nursery.start_soon.mod(name="foo")(fn, *args, **kwargs) would work best ("mod" being the shortest name I could think of that still means something reasonable) given that nursery.start_soon(name="foo")(fn, *args, **kwargs) seems to be too magical for some people's taste. Modifiers on fn can be achieved very easily, by using fn.mod(bar='baz').

I don't like magic underscores; "internal use only" is not the same as "strip the under and feed me to the next part". I'd hate to be required to again resort to functools.partial except now in a different set of circumstances.

I'd leave task_status alone. Its default value wants to be renamed, but that's a different issue.

dhirschfeld commented 5 years ago

IMHO having _ or __ prefixed kwargs is a horrible kludge and think that using partial is preferable to that - i.e. the cure is worse than the disease!

I personally like @smurfix's solution the best (maybe s/mod/config/). The verbosity doesn't really bother me and, to me, is greatly preferable to _ prefix kwargs being implicitly passed as config.

njsmith commented 5 years ago

The discussion around PEP 570 made me realize an interesting corner case... even if we have a signature like:

async def run(async_fn, *args, **kwargs):

...then technically it's not quite true that you can pass through any kwargs. Specifically, you cannot pass through a kwarg named async_fn :-)

This may not really matter in practice (in the unlikely case that someone hits it they can use partial), but I found it surprising.

mentalisttraceur commented 5 years ago

Here is a perspective that I don't rigidly or fully stand by but which is important and I haven't seen said here:

The learning curve for partial could be massively reduced if:
1. every single function like trio.run() just stopped taking *args to pass to the async function as well,
2. the documentation and examples as a result showed partial being used all over the place,
3. (if needed) the documentation linked to a small and really intuitive summary of what partial is and why it is good.
Every time you force people to use partial you make the world a little better, because:
1. partial is amazingly versatile and uses for it tend to turn up in many unexpected places,
2. developers who "think with" partial are thus more versatile developers,
3. passing arguments along with a callable is a distinct code feature, extra functionality, and partial properly pulls and abstracts just that feature into its own composable form, making it unnecessary for people to keep reimplementing it,
4. partial makes it easier to
  - verify correctness,
  - write correct code, and
  - understand what the code is doing
  with less mental overhead, without having to jump around to other parts of the code or documentation to verify what happens to the arguments being passed through.

Personally, both in my own code and at work,

I intentionally avoid, change, and push back against any interface that accepts arguments along with callables anywhere that partial will do, and
I push for partial over functools.partial, even as I advocate for everything else to be properly namespaced, because the latter is a pain to write and because partial application is a fundamental operation, like +.

And I have been happier ever since, enjoying in particular a greater smoothness of thought due to no longer having to do a depth-first search of the possible future effects and uses of this kind of argument passthrough.

I do get frustrated having to write calls to partial sometimes, but then I remember that code is read more than it is written, and that my insistence on partial-only argument passthrough has significant return on investment in people being able to

learn by example,
write more robustly correct code,
write more future-proof code, and
verify code correctness

with less overhead. And that makes the upfront nuisance of writing partial not feel so bad.

TL;DR: forcing people to always use partial is good in some significant, permeating ways that are really worth weighing here.

oremanj commented 4 years ago

I thought of yet another possibility recently: Say that if you want to pass Trio options, you need to pass the function to call through a kwarg too (target=?), and use partial if it takes arguments. If you don't want to pass Trio options, you can pass the function and its positional and keyword arguments without a layer of partial. We distinguish the two cases based on whether we got any positional arguments.

So you'd have nursery.start_soon(fn, arg1, kwarg=value) but nursery.start_soon(target=partial(fn, arg1, kwarg=value), name="hi").

This can be represented nicely in type stubs using @overload for the benefit of static checkers, and is at least reasonably amenable to a clean deprecation.

Downside: it doesn't play very nicely with #563. :-(

mentalisttraceur commented 4 years ago

@oremanj That's a pretty neat idea actually, although based on my experience my immediate intuition is that this feels rather likely to trip people up, especially because it's harder to implement it in combination with #563 (speaking of, I wrote some thoughts on that issue that are relevant here, thanks for pointing that issue out).

Like I'm picturing a situation where someone has nursery.start_soon(fn, arg, kwarg=kwarg) and decides "don't need to pass arg anymore" and deletes it and moves on - because that's the cognition flow that normally works when we no longer need to pass an argument. So to get it right everyone would have to train an additional thought habit: "oh wait does this function completely change how it interprets any of its other arguments based on the the presence of a different argument?"

But developing and diligently using such mental habits is costly, and maintaining that mental habit as a general case would be an uphill battle because almost all of the time the answer would be "no" and it would be time wasted, so the mind would naturally try to unlearn it. And if you had to special case it to certain functions that would be additional mental cost, and would require more proactive, out-of-the-way mental rehearsals of the memory/"cognition snag" that would trigger that mental check in response to it being one of the functions that it applies to.

And that's the kind of situation shape that in my experience leads people to make code changes that they think are so obviously correct that they don't notice were actually wrong until some time/confusion/debugging/deployments later.

Type checkers that can handle overloads would catch it, but in many situations people either don't use them, or only use them at the end of a lot of different modifications, or frequently but still manually, and in any case it would be a jarring "wait... what? why?" experience.

So after further thought I would advise against, but it was good creative thinking.

njsmith commented 4 years ago

Looking at this again after a long absence (partly motivated by https://github.com/python-trio/trio/issues/1104#issuecomment-630000695), I'm wondering why we never considered borrowing Python's dunder convention:

trio.run(f, x=1, __clock__=custom_clock)
nursery.start_soon(f, x=1, __name__="custom name")
await trio.to_thread.run_sync(f, x=1, __limiter__=custom_limiter)

A key thing here is that every one of these kwargs is an exotic feature that beginners will never see. AFAIK there are only two exceptions to that, which are the port argument to serve_tcp and the port and ssl_context arguments to serve_ssl_over_tcp (and tbh the latter is already a semi-advanced topic, because configuring SSLContexts is not for the faint of heart). OTOH serve_tcp does show up in your classic 10-line echo server that you put on the top of your README.

But we can solve that by making it:

await trio.serve_tcp(port, handler, *args, **kwargs)

(i.e., flipping the position of port and handler compared to what we have today).

The one obnoxious bit is backcompat, but I guess this is actually not too bad, because port is always an integer and handler is never an integer, so you can use a runtime type check to detect when someone has the arguments backwards, issue a warning, and flip them the right way around.

oremanj commented 4 years ago

Ooh, I like using dunders a lot better than just leading underscores.

Friendly amendment: can we use _sunder_ names rather than __dunder__ names? There's precedent (enum in the standard library), it's a little less noisy/magical-seeming, and it's still very unlikely to clash with a normal function kwarg.

The reordered serve_* arguments actually read better than the original to me: "serve_tcp on port using handler". We could similarly do "serve_ssl_ with context over_tcp on port using handler", and "serve_listeners these using handler".

oremanj commented 4 years ago

@mentalisttraceur

Like I'm picturing a situation where someone has nursery.start_soon(fn, arg, kwarg=kwarg) and decides "don't need to pass arg anymore" and deletes it and moves on

This would still fall into the "pass all arguments through to fn" case though, because start_soon still has one positional argument: the function itself! I'm proposing that if you want to specify options for start_soon(), you have to wrap the function and its arguments up in a single callable (probably a partial()) and pass that via a kwarg (target=) too, so that the start_soon() call has no positional arguments at all.

I agree that the version that doesn't let you easily call "fn with only kwargs" would be super confusing and non-orthogonal.

@njsmith I'm curious for your thoughts on this approach. I still think I prefer it a bit versus using _sunder_ names, but I could go for either.

dhirschfeld commented 4 years ago

Either __dunder__ or _sunder_ kwargs seems IMHO like a very ugly solution :(

I would personally prefer the status-quo over either. Using partial isn't that great of a burden - I think this is a case of the cure being worse than the cold.

A key thing here is that every one of these kwargs is an exotic feature that beginners will never see.

I think that this comment points to the fact that it isn't a great solution - i.e. it's really ugly, but no one will have to see it anyway. If it's not a great solution for your average user, I don't think we should be foisting it on even expert users - if we can get away without doing so.

A further red-flag for this design is the fact you pointed out that it will actually be exposed in one of the most common usages - specifying a port in serve_tcp. Whilst you point out that this could be avoided by changing the order of the argument from that which would normally be expected I think at this point we should really consider if it's actually providing any benefits:

trio.run(partial(f, x=1), clock=custom_clock)
trio.run(f, x=1, __clock__=custom_clock)

Out of the two api's above I vastly prefer the former (current) - but maybe that's just me?

If we're bound and determined to get away from using partial I'll put in one more plug for @smurfix's solution above (https://github.com/python-trio/trio/issues/470#issuecomment-473648115) of using a custom .mod (or .cfg) method:

trio.run(f, x=1)
trio.run.cfg(clock=custom_clock)(f, x=1)

nursery.start_soon(f, x=1)
nursery.start_soon.cfg(name="custom name")(f, x=1)

await trio.to_thread.run_sync(f, x=1)
await trio.to_thread.run_sync.cfg(limiter=custom_limiter)(f, x=1)

await trio.serve_tcp(handler, *args, **kwargs)
await trio.serve_tcp.cfg(port=port)(handler, *args, **kwargs)

njsmith commented 4 years ago

@oremanj In general I'm not a big fan of APIs where there are two disjoint modes you have to learn, and then learn which one to use when. Try writing out the docs to explain how to use your system – can you make it shorter/clearer than this?

trio.run(async_fn, *args, **kwargs, __clock__=None, ...)

Runs await async_fn(*args, **kwargs).

Arguments used to control trio.run itself have __double_underscores__ around them, to avoid colliding with arguments passed to async_fn.

Args: __clock__ (trio.abc.Clock or None): This argument is used to...

We'd need a bit more text somewhere to explain how to define your own args+kwargs functions, and mentioning partial as a workaround for the unlikely case where you actually need to pass a __dunder__ argument to the inner function, but that could be buried in the reference docs; these are issues that 99% of users will never need think about.

Re: _sunders_ versus __dunders__: I'm not an immediate fan of sunders, mostly because I'd never heard of them until your post :-). Looking around, it looks like a semi-obscure thing that only enum uses? or maybe there are some obscure IPython-related APIs that use it too? Googling python dunder gets me lots of detailed posts, while python sunder gets me stuff like this and this :-).

As far as visual noise goes, I don't think the difference between sunders and dunders is very significant. Once you have underscores at all, you've committed to having a big ol' visual flag that there's something magical going on here; the exact size of the flag is a pretty minor detail. And the nice thing about dunders is that Python programmers are already trained to think of them as a flag meaning "this thing has special-case magic semantics". OTOH if I saw a sunder I'd just be like "that's... weird-looking, is that like a dunder? is it a typo? what is that?".

When you see def __add__(self, other): ..., do you think "ugh, that's so much noise, I wish there was only one underscore", or do you think "okay, yeah, this class is doing something exotic, let's see how it works".

@dhirschfeld

I'm not sure that partial is actually more comprehensible to the median new Trio user as compared __dunder__s. And the problem with partial is that you need to pass kwargs constantly, while you hardly ever use a custom clock. So for 99% of users, the relevant comparison is

trio.run(partial(f, x=1))
trio.run(f, x=1)

Also, re:

await trio.serve_tcp(handler, *args, **kwargs)
await trio.serve_tcp.cfg(port=port)(handler, *args, **kwargs)

port is a mandatory argument, so that first line doesn't make sense. And I'm not putting await trio.serve_tcp.cfg(12345)(handler) at the top of our README :-). So even if we did go with .cfg, then we'd still want to make it serve_tcp(port, handler, ...).

dhirschfeld commented 4 years ago

I was assuming port had a default. If you were going to specify it in a cfg method I'd be inclined to require it to be a kwarg so that it's self documenting:

await trio.serve_tcp.cfg(port=12345)(handler)

...apparently my tastes tend towards the more verbose 😄

oremanj commented 4 years ago

Try writing out the docs to explain how to use your system – can you make it shorter/clearer than this?

This is a great exercise, thank you for the suggestion! I realized that target-as-kwarg is indeed hard to communicate, but I came across another approach that I think can be communicated quite nicely:

trio.run(async_fn, *args, **kwargs) trio.run(trio.call(async_fn, *args, **kwargs), clock=None, ...)

Runs await async_fn(*args, **kwargs).

The second form allows you to customize the behavior of trio.run() by passing any of the keyword arguments described below. (The trio.call() wrapper lets Trio distinguish the arguments for async_fn from the options for trio.run().) The first form is shorthand for when the default behavior of trio.run() is fine, which it usually is.

Args: clock (trio.abc.Clock or None): This argument is used to...

trio.call is a thin subclass of functools.partial.

Why a new name and not just use partial()?

If partial() were sufficiently discoverable and obvious, our users would be using it and we wouldn't need this issue. It's already linked in the Trio documentation.
It gives us a convenient place in the documentation to describe the overall approach, how to write your own functions that follow this scheme, and so on. The text trio.call in the synopsis of each function that uses this approach can be a hyperlink.

Why a subclass and not just an alias?

The different type allows definite disambiguation between the two ways to call run(). If it were just an alias, someone might get into trouble when they unknowingly wrap async_fn in a decorator that returns a partial object.
The different type allows for better static analysis: a trio.call should be invocable with just the listed arguments, while a functools.partial might need more arguments to be specified. (task_status confounds this a little bit, but dealing with that one special case is much nicer than dealing with arbitrary possible additional arguments.)
We can add a __del__ that prints a warning if the object wasn't passed to an appropriate Trio function, so people don't get too confused thinking trio.call is a verb rather than a noun.

Why a subclass and not a wrapper?

partial() is implemented in C on recent CPythons. Subclassing imposes less of a performance penalty for common operations.
We only need to add one attribute and can name it something arbitrarily unlikely to collide with the stdlib.
partial() is already relatively subclassing-friendly; for example, its __repr__ uses its actual type name rather than the string "partial".

mentalisttraceur commented 4 years ago

Can someone remind me of where the wisdom for not taking already initialized coroutine objects is documented?

Because every time this discussion comes up I can't help but notice that if trio did take pre-created coroutine objects, all of this would be a non-issue because they just bundle in their own arguments:

trio.run(async_fn(foo, bar, qux=0), ...)

njsmith commented 4 years ago

Trio strives to be usable by folks who don't know anything about async/await besides "async functions are like regular functions, except you need await to call them", so we treat async and non-async functions the same.

This makes it easier for non-experts to understand the code, and also for static analysis tools to catch common mistakes like missing await.

Also, some of the APIs we're talking about actually take non-async functions, like trio.to_thread.run_sync, so we can't just rely on coroutine objects, even if we wanted to :-)

On Tue, May 19, 2020, 16:05 mentalisttraceur notifications@github.com wrote:

Can someone remind me of where the wisdom for not taking already initialized coroutine objects is documented?

Because every time this discussion comes up I can't help but notice that if trio did take pre-creared coroutine objects, all of this would be a non-issue because they just bundle in their own arguments:

trio.run(async_fn(foo, bar, qux=0), ...)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/python-trio/trio/issues/470#issuecomment-631132907, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEU42BSCC7ZQSVVWH6HJH3RSMGBRANCNFSM4EVWXYJA .

mentalisttraceur commented 4 years ago

Restating a concept I mentioned in #563 that feel very relevant here:

Imagine trio provides the following decorators:

def pass_args(wrapped):
    def wrapper(function, *args, **kwargs):
        return wrapped(partial(function, *args), **kwargs)
    return wrapper

def pass_kwargs(wrapped):
    def wrapper(function, *args, **kwargs):
        return wrapped(partial(function, **kwargs), *args)
    return wrapper

def pass_args_kwargs(wrapped):
    def wrapper(function, *args, **kwargs):
        return wrapped(partial(function, *args, **kwargs))
    return wrapper

Then the use-case in this issue is solved with one line:

run = trio.pass_kwargs(trio.run)

Boom. User wants kwarg passthrough? One line to "enable" it, and now they have it.

mentalisttraceur commented 4 years ago

P.S. Re: @njsmith replying to my earlier comment: Okay, I see, it keeps a symmetry between sync and async functions - that makes sense. Hopefully I'll remember that this time. Thanks!

mentalisttraceur commented 4 years ago

Ohh, right, @njsmith even summarizes the problems with taking coroutine objects up at the top here - and that includes not only what he just said in reply to me but also support for things like supervisors, which want to be able to re-run the function they were given. Okay, yeah, it's all coming back now.

mentalisttraceur commented 4 years ago

Another idea:

Redefine trio.run to take no options - pure passthrough of both *args and **kwargs.
Define a trio.run_with_options, which does no argument passthrough - just takes the callable and options for itself.

I think this covers the common case in a very user-friendly way, and the advanced/unusual cases in a simple way. Both very regular: no special cases.

This approach can also be done with nursery.start_soon, and maybe other Trio APIs.

mentalisttraceur commented 4 years ago

Oh, a relevant reply to this old comment in this issue by @njsmith :

There is a workaround for making sure truly any keyword argument is passed through:

def run(*args, **kwargs):
    async_fn, args = args[0], args[1:]
    ...

Not ideal for any documentation/introspection that relies on the signature, but it does solve the issue robustly on older Python versions that don't have the positional-only arguments feature.

njsmith commented 4 years ago

@oremanj

This is a great exercise, thank you for the suggestion!

Huh, is that really the first time I mentioned it? It's one of the major heuristic I have running in the back of my head all the time :-).

trio.run(async_fn, *args, **kwargs)
trio.run(trio.call(async_fn, *args, **kwargs), clock=None, ...)

Hmm. I'll think about it more, but some initial impressions:

It is nice that it keeps the common case of no-arguments nice and clean.

Getting docs to look like that will require a custom sphinx extension. Writing and maintaining sphinx extensions is not my favorite activity :-).

It definitely feels weird to say that in these two calls, the clock= argument is completely different and unrelated, even though it's being passed to the same function:

trio.run(f, clock=1)
trio.run(trio.call(f), clock=MockClock())

In fact, the type of the clock= argument depends on the type of the first argument. Doesn't this make it harder to mypy-style typechecking (and maybe other kinds of static analysis too)? Mayybe you can do something with @override? Having a single __clock__ kwarg that always means the same thing and always has the same type seems like it would avoid some complexity here.

oremanj commented 4 years ago

Getting docs to look like that will require a custom sphinx extension. Writing and maintaining sphinx extensions is not my favorite activity :-).

I agree that figuring out how to represent this nicely in the Sphinx output is an important precondition for having it be the approach we take. I think I can figure it out, and am offering to do that work if we think it's the best way forward in other respects.

In fact, the type of the clock= argument depends on the type of the first argument. Doesn't this make it harder to mypy-style typechecking (and maybe other kinds of static analysis too)?

This can be represented to mypy just fine with an @overload (once mypy understands "this takes args and kwargs that match its argument" at all, but that's required for any of the proposed options -- it's PEP 612, and work appears to be underway on supporting it in mypy).

It definitely feels weird to say that in these two calls, the clock= argument is completely different and unrelated, even though it's being passed to the same function:

I think we should raise a warning in the trio.run(f, clock=1) case, because it is confusing and you can always work around it using trio.call(). People will need to avoid these sorts of constructs for a few releases no matter what approach we choose, because trio.run(f, clock=1) is currently valid and sets the clock for trio.run() rather than the argument for f().

njsmith commented 4 years ago

I think we should raise a warning in the trio.run(f, clock=1) case, because it is confusing and you can always work around it using trio.call(). People will need to avoid these sorts of constructs for a few releases no matter what approach we choose, because trio.run(f, clock=1) is currently valid and sets the clock for trio.run() rather than the argument for f().

Hmm, this could be a problem. So obviously, we'll need some temporary transition plan – probably phase 1, continue to accept clock= and friends as now but with a warning, phase 2, don't accept the old-style kwargs at all, phase 3, start accepting pass-through kwargs. But this should be a temporary thing.

If you think that trio.run(f, clock=1) should raise a warning indefinitely because it's confusing, then that pretty much nixes this approach: the point of having a general rule for matching kwargs with the target function is to make sure we can add new kwargs later without disrupting existing code. But if adding new kwargs to means that old usages become confusing and need to start emitting warnings, then we'll have failed at that.

oremanj commented 4 years ago

If you think that trio.run(f, clock=1) should raise a warning indefinitely because it's confusing, then that pretty much nixes this approach: the point of having a general rule for matching kwargs with the target function is to make sure we can add new kwargs later without disrupting existing code. But if adding new kwargs to means that old usages become confusing and need to start emitting warnings, then we'll have failed at that.

I'm not especially attached to "raise the warning indefinitely", but yes, that was my initial thought. Reasoning: People who wrote trio.run(f, newarg=1) to mean f(newarg=1) before trio.run() accepted a newarg= option will still have their code do the same thing as before. They get a warning when they upgrade, the fix for which is clear, and their code will never stop doing what they originally meant even if they ignore it. The target audience for the warning is people who write trio.run(f, newarg=1) after trio.run() starts accepting newarg. It encourages them to think about whether they mean trio.run(trio.call(f, newarg=1)) or trio.run(trio.call(f), newarg=1). While it's important that adding new options not break existing code, I think imposing a small burden on the minority of users calling a function with a kwarg that collides with the new option is an acceptable tradeoff in order to improve the experience for new users. But if you disagree, I think the overall proposal stands fine without the warning plan.

Alternative approach that makes the options case even more verbose but avoids the theoretical ambiguity:

trio.run(f, 42, kw="hi")
trio.run_options(trio.call(f, 42, kw="hi"), clock=MockClock())

More options for bikeshedding:

trio.run(trio.call(f, 42, kw="hi"), trio.options(clock=MockClock())
trio.run(f, 42, kw="hi", trio_options=dict(clock=MockClock()))

Comparison of a more complex example:

# dunders (optionally replace trio.call with partial)
nursery.start_soon(
    trio.call(
        trio.to_thread.run_sync,
        trio.call(token.run_sync_soon, foo, __idempotent__=True),
        __cancellable__=True,
    ),
    __name__="foo",
)

# my proposal
nursery.start_soon(
    trio.call(
        trio.to_thread.run_sync,
        trio.call(token.run_sync_soon, trio.call(foo), idempotent=True),
        cancellable=True,
    ),
    name="foo",
)

# functions take trio.call and trio.options
nursery.start_soon(
    trio.call(
        trio.to_thread.run_sync,
        trio.call(token.run_sync_soon_options, trio.call(foo), trio.options(idempotent=True)),
        trio.options(cancellable=True),
    ),
    trio.options(name="foo"),
)

# with _options on the function name
nursery.start_soon_options(
    trio.call(
        trio.to_thread.run_sync_options,
        trio.call(token.run_sync_soon_options, trio.call(foo), idempotent=True),
        cancellable=True,
    ),
    name="foo",
)

# with options bundled into a trio_options= kwarg
nursery.start_soon(
    trio.call(
        trio.to_thread.run_sync,
        trio.call(token.run_sync_soon, foo, trio_options=dict(idempotent=True)),
        trio_options=dict(cancellable=True),
    ),
    trio_options=dict(name="foo"),
)

I agree that the dunder approach is in some ways less confusing, but I think an approach that a user finds ugly will grate on them just as much as will an approach that they find mildly confusing, and possibly moreso long-term (confusion fades as understanding increases, ugliness not so much).

mentalisttraceur commented 4 years ago

I think the example @oremanj used in that last comment does a good job of showing why my pass_kwargs idea was not very good for consistently and generally solving this recurring problem throughout Trio's APIs.

Because as soon as we try to do that example with pass_kwargs, the actual options have to be partially applied to the underlying Trio function first, then that partial object has to be ran through pass_kwargs.

Granted, the pass_* functions were meant to just cover the common case and the case where you are calling the same API enough to want shorthand for your common cases, but I think the weakness versus the other proposals here for examples like the one just given is worth putting into focus.

I think having seen and considered this, I probably/tentatively think my pass_* suggestion is not a sufficient solution by itself for what users are wanting here, even if it is used as part of the solution.

njsmith commented 4 years ago

Some discussion on typing-sig about PEP 612, and what it will/won't let us do for typing functions like trio.run:

https://mail.python.org/archives/list/typing-sig@python.org/thread/AP3PQVD2QT2WB7XIKG5RFZ3DO6OXL5ZH/

mentalisttraceur commented 4 years ago

Remember the idea someone else mentioned to add an .options attribute onto every Trio method or function that takes the options, and returns a callable which just does that operation with those options applied?

I can't help but notice that with that approach, the nested example given by @oremanj looks like this (unless I misunderstood what that nested example does, but if so then I think that is a big point against those other methods):

nursery.start_soon.options(name="foo")(
    trio.to_thread.run_sync.options(cancellable=True),
    token.run_sync_soon.options(idempotent=True),
    foo,
)

Which I think is just much nicer than any of the other alternatives presented. In particular, it's very clear and impossible to make unclear that the options are options and which thing they are options to.

The implementation can look like this:

# Using PEP-570 positional-only parameters:
def _run(options, async_fn, /, *args, **kwargs):
    ...
# Or portably (probably okay for internal functions?):
def _run(*args, **kwargs):
    options, async_fn, args = args[0], args[1], args[2:]
    ...

def run(async_fn, *args, **kwargs):
    return _run(_default_options, async_fn, *args, **kwargs)

def _run_options(clock=None, instruments=(), restrict_keyboard_interrupt_to_checkpoints=False):
    options = ...  # however this is done
    return functools.partial(_run, options)

run.options = _run_options

Options should probably be an attrs class. Doesn't really matter, but having it as a well-defined class would also allow for a nice internal decorator so that all of the above could be reduced to:

@attr.s
class _RunOptions:
    """docstring that we want on ``run.options``"""
    ...

@_options(_RunOptions)
def run(options, async_fc, /, *args, **kwargs):
    """docstring that we want on ``run``"""
    ...

(I can draft up an implementation of _options if there is interest in going this route.)

njsmith commented 4 years ago

@oremanj

I agree that the dunder approach is in some ways less confusing, but I think an approach that a user finds ugly will grate on them just as much as will an approach that they find mildly confusing, and possibly moreso long-term (confusion fades as understanding increases, ugliness not so much).

Do you find that writing def __add__(self, other): ... or referencing obj.__name__, etc., grates on you and makes you dislike using python?

@mentalisttraceur If we do go that way then I think we can make the implementation simpler by having the decorator package up the callabe+args into a single argument for the underlying function:

@attr.s(frozen=True, slots=True)
class takes_callable:
    wrapped = attr.ib()

    def __call__(self, fn, *args, **kwargs):
        return self.wrapped(partial(fn, *args, **kwargs))

    def options(self, **options):
        return lambda fn, *args, **kwargs: self.wrapped(partial(fn, *args, **kwargs), **options)

@takes_callable
async def run(async_fn, /, option1=None, option2=None):
    ...

(This might even be typable using the PEP 612 draft notation? I'm not 100% sure.)

It still has the downsides of being hard to document, and I think the use of )( may be highly polarizing, i.e. some people think it's totally fine and others will experience a brain reboot when they try to parse it. (A lot of people find higher-order functions unintuitive and awkward to think about. Also I'm pretty sure I saw Guido somewhere saying that he found anything that used )( intolerable.) The )( issue could be somewhat mitigated by making it foo.options(...).call(...), though I dunno what the best name for the intercalary method would be.