Get task retvals from the nursery after it finishes (aka "fake-gathers considered harmful")

python-trio / trio

Trio – a friendly Python library for async concurrency and I/O

https://trio.readthedocs.io

Other

6.13k stars 335 forks source link

Get task retvals from the nursery after it finishes (aka "fake-gathers considered harmful") #2633

Open dubslow opened 1 year ago

dubslow commented 1 year ago

I haven't seen a way to do this in the docs, and it certainly seems very logical/thematic if it were to work like this. The nursery is the starter and finisher of tasks, its sole and entire purpose is to start tasks and ensure they finish in some halfway-sensible way, with some sort of sane control flow.

Therefore it's only logical that the return values of those tasks are collected by the nursery. Return values shouldn't disappear into the void any more than errors. The nursery gathers errors, so it should gather retvals too.

I find myself wanting to gather return values from my tasks somewhat akin to the following, taken from the docs: running a process for an arbitrary amount of time which dynamically spawns an arbitrary number of tasks.

A little bit of searching finds me a recipe to kinda back my way into this functionality faking out a sort of trio.gather, but it seems to me to be a pretty basic primitive of structured concurrency, and if it's common enough to be a recipe, then it's common enough to implement once and get right instead of rewriting it (or at least repasting and in-place-modifying it) every time you call a nursery.

But then I'm quite new to asyncio and structured concurrency. All I know is that I greatly enjoyed NJS's blog post on control flow, and have equally enjoyed my first 100 lines of code. But I would love even more to either 1) have this primitive implemented in trio 2) understand why it's not a primitive in some way. It sure sounds like one: "nurseries spawn tasks, nurseries collect tasks, nurseries call functions, ergo nurseries collect those functions' retvals"

smurfix commented 1 year ago

if it's common enough to be a recipe, then it's common enough to implement once and get right

The problem is: What is "get right"? In many cases you want partial results from subtasks as soon as they're available. (Possible solution: Use a memory channel.) Sometimes you also need to associate task names/objects with the results (send a name+result tuple to the channel.) In others you need a list (write a two-liner that appends the result to an array) or associated with an object (store the result in the object, or write another two-liner that saves it to a dict.) In many others you don't need the return values at all and don't want to save a possibly-unbounded list of None values. And sometimes you need to combine any of the above with a concurrency limiter, which also might have to decide what to do about the return value.

The overall idea is that, confronted with the fact that no one-size-fits-all recipe exists (IMHO cannot exist), you think about the problem as you encounter it, and choose the method that best fits your situation instead of trying to shoehorn your problem into the one-size-fits-all recipe.

We probably should add this to the documentation somewhere. Assuming it's not in there already …

dubslow commented 1 year ago

In many cases you want partial results from subtasks as soon as they're available. (Possible solution: Use a memory channel.)

Of course I'm only talking about retrieving return values after the nursery has already closed. While the nursery is still running there are of course plenty of ways to do it, but that's not the issue.

(send a name+result tuple to the channel.)

See above. Or at least, I haven't figured out how to store what's written to a channel to read out after the nursery has closed -- besides, if I did have such a way, then I wouldn't need a channel in the first place, I could just directly store it.

In many others you don't need the return values at all and don't want to save a possibly-unbounded list of None values

probably the better default would indeed be for the nursery to simply discard return values from tasks, as they currently do.

but I would like the option for the nursery to store them and give me access after it closes. (it doesn't need to be fancy: just storing something like nursery.results[task].append(retval) would be a sound way to let the user get access. it is of course up to the user to choose some sensible, useful return value from their task. then we could write

async with open_nursery(store_retvals=True) as nursery:
    for blah in blah:
        nursery.start_soon(my_task, blah)
# exit `with`, all tasks now complete, nursery closed!
# typically (but not necessarily) we now return to a synchronous style
for retval in nursery.results[my_task]:
     collate_and_process_final_results()

And sometimes you need to combine any of the above with a concurrency limiter,

Is this not completely irrelevant after the nursery has closed?

The basic usecase I see is that IO typically means communication with some outside program not controlled by the user. In the general case, the interaction between the outside partner and the user may continue arbitrarily long, and the user may not ever know when it will end until suddenly it ends. Therefore getting the "final result" inside the nursery execution scope is, by definition, impossible. I just don't know how to pass results from inside the nursery to outside the nursery.

A5rocks commented 1 year ago

Hi! For your specific problem, I presume something like the pattern mentioned in https://github.com/python-trio/trio/issues/2632#issuecomment-1506837686 works for you?

(Where you just append the result to a list or add it to a dict or something)

If it does, should we just document this particular composition of features? If not, could you provide a more concrete example? I don't think I get what you want.

As for why not just add a kwarg or something, I'm more in favor of just using a pretty basic composition strategy like this cause it's just more flexible! Just like how we don't have timeout args (normally, except for like if there's multiple sub-timeouts such as making an http request) and instead have a context manager!

dubslow commented 1 year ago

Disclaimer: am noob, but I don't see how that makes the result available outside the with nursery: statement. Indeed as I edited into my previous comment, in principle our code may not know the final result until the nursery closes. Perhaps I'm not paying enough attention to cancellation semantics, because I suppose I'm saying that when a task is cancelled, it would clean itself up by returning its final result -- where the result isn't final until it's cancelled.

So perhaps I've been not stating my question correctly.

My head hurts lol

dubslow commented 1 year ago

Here's a toy example expanded from a throwaway in the docs. Obviously it doesn't run, and as I said in my previous comment I think it essentially doesn't work as I intended w.r.t. exceptions/cancellations, but that's the semantics I'm looking for...

https://gist.github.com/dubslow/449c7743e917b6130719df7be01b8b26

dubslow commented 1 year ago

And modifying that, I've come up with a toy example that actually works and does what I want it to:

https://gist.github.com/dubslow/379342c6dbc0efc69491fe9c8becc089

However I find this code to be extremely smelly, and a considerable anti-pattern. I would love for a better way to do this.

example output:

awaiting next connection: asdf
got text='asdf' from new conn
awaiting next connection: gg
got text='gg' from new conn
awaiting next connection: go next
got text='go next' from new conn
awaiting next connection: Hello, world!
got text='Hello, world!' from new conn
awaiting next connection: ^C
got text='' from new conn
After server closed, we've counted lifetime total conns=5 messages=9

A5rocks commented 1 year ago

Hmm.. I have to say I still don't quite get the problem. Like, this is what I think of when I think "asyncio.gather" to try to demonstrate what I mean by my previous pattern (I haven't tested this works, for what it's worth):

import trio

async def gather(tasks):
  results = {t: None for t in tasks}  # in-order results
  async def run(task):
    results[task] = await task()

  async with trio.open_nursery() as nursery:
    for task in tasks:
      nursery.start_soon(run, task)

  return results.values()

Looking at your toy example, I don't quite get how you expect to have the counters done for you? Sorry for any confusion and for taking a while to reply, got very distracted!

A5rocks commented 1 year ago

Ah sorry if I'm not really being clear on why I'm saying this:

I think I agree with @smurfix that there are so many specific things you might want (asyncio.gather (which has a boolean flag for operation), asyncio.as_completed, asyncio.wait (which overlaps with asyncio.gather with a 3-member enum flag for different types of operation)).

As such, I'd much rather people figure out this pretty core pattern and have it down. Going from "oh I need to do some work for every task" or "I need to prepare stuff before every task" or "how can I keep a outcome.Outcome for all run tasks" to "oh I can just use a function" is an instinct I'd personally rather users have (disclaimer: I'm not involved in library design :^) and I do think having docs that cover this case would be nice, as I do not remember the docs having something similar.

dolamroth commented 1 year ago

I think, you may just use https://github.com/tiangolo/asyncer It is built with anyio, so it's trio-compatible https://asyncer.tiangolo.com/tutorial/soonify-return/

dubslow commented 1 year ago

Hmm.. I have to say I still don't quite get the problem. Like, this is what I think of when I think "asyncio.gather" to try to demonstrate what I mean by my previous pattern (I haven't tested this works, for what it's worth):

Essentially, yes I'm talking about the gather pattern such as you write, but with some more reflection upon this thread and the many links in it, I think the use of such gather recipes is hiding the fact that trio silently discards part of the control flow of the nursery -- the return values of task-function. That is, nurseries silently ignore half of the function interface, and considering that the function is a core block of Structural Programming, nevermind Structured Concurreny, it seems to me that trio violates its core principle in so doing.

In other words, I find the gather recipe to be only a hack covering a basic flaw in trio nurseries. Self-implementing the semantics is error-prone, not remotely thread-safe for example, and is exactly what a library is supposed to do: do the hard, common stuff right so the user doesn't have to. This hack of referring to a variable outside the nursery execution scope is exactly what I referred to as an "extremely smelly ... considerable anti-pattern" in my previous comment. Implementing the task-function interface, including the collection of return values, is the nursery's job, not the job of the scope containing the nursery, as in your example or mine or a dozen others that can be found on the internet. It looks far too much like the use of a global variable in oldschool threading, and carries exactly the same flaws, namely thread safety among others.

I think I agree with @smurfix that there are so many specific things you might want (asyncio.gather (which has a boolean flag for operation), asyncio.as_completed, asyncio.wait (which overlaps with asyncio.gather with a 3-member enum flag for different types of operation)).

Being new to async in general, and having been persuaded by NJS's blogpost that I should therefore start with Structured Concurrency and ignore asyncio entirely, I'm not fully sure what those functions are.

That said, based on a quick Ctrl+F on the asyncio docs, they look considerably more complicated than what I say trio is missing. For instance, I believe that any talk of Futures automatically implies non-structured concurrency -- one of many goals of trio is to make such things redundant. Or for another example, as_completed looks like a crappy version of memory channels. Well this is certainly doing a good job of convincing me that NJS is right in the first place, and that trio is better than asyncio for writing concurrent code. (If by outcome.Outcome you mean this, then I'd say that looks exactly like an asyncio.Future, a non-structured item, and whoever wrote it doesn't understand the meaning of structured concurrency. But then again, am noob, so take me with a grain of salt.)

All I'm saying is that nurseries silently discard half of the interface of task-functions, and discarding function interfaces is precisely non-structured, the very thing that trio is supposed to not be. Nurseries should store task-function results, and make them available to the containing scope after the nursery has exited (after the with block). Not doing so discards half the task-function interface, breaking the ideal of structured concurrency. The whole point of structured concurrency is that I should be able to easily trace control flow, including data that crosses stack boundaries -- arguments and retvals -- at all times by merely glancing at the code.

Writing my own gather recipe by means of not-local-to-nursery variables is exactly the sort of anti-pattern that structured concurrency is meant to avoid. Re-phrasing it, we should be able to think of the with open_nursery() as nursery: block as its own function, its own scope so to speak, and thus all these gather recipes using closures, nonlocal references to stuff outside the nursery block, is exactly the sort of code that trio is supposed to prevent. And the nursery can, and should, store the return values for use after the with scope/block completes and exits, without use of "closures".

(I intended to not write a blogpost in a random github comment.... mission failed lol)

I think, you may just use https://github.com/tiangolo/asyncer It is built with anyio, so it's trio-compatible https://asyncer.tiangolo.com/tutorial/soonify-return/

That looks again like asyncio.Future or outcome.Outcome, the sort of thing that structured concurrency is supposed to prevent. If I need the results inside the nursery, I use memory channels, which is what channels were designed for. If I need the results after the nursery, then I shouldn't be writing that code inside the nursery with funny non-structured objects like Futures or Outcomes or SoonValues, which are of course all the same thing by different names, and which are all guaranteed to be completed anyways after the nursery exits.

dubslow commented 1 year ago

Let me try to summarize.

Nurseries are not yet complete implementations of structured concurrency: the fact that all these gather-like recipes use closures (at least in their shortest form they do), not to mention the fact that those closures cross synchronous-concurrent boundaries (!!), is a big hint that these gather-like recipes are in fact covering for a big flaw in nursery's implementation of structured concurrency. (Of course, one could simply pass the synchronous results collector into the nursery and its tasks, which would make it technically not a closure, but concurrent tasks all writing to a synchronous variable remains a seriously smelly anti-pattern, breaking the abstraction principles of structured concurrency, breaking the "black box" idea, no matter if it's technically a closure or not.)

I consider these fake-gather recipes to be harmful in much the same way that NJS (rightfully) considers go harmful.

A slightly longer redux:

Trio is meant to implement structured concurrency, where we can trace control flow (including stack/function call data) at any time, easily, by merely glancing at code.

Nurseries are the essential means of translating between synchronous and concurrent code, as per the neat little graphs in NJS's blogpost. The with ... as nursery: is the essential block: nurseries are in many ways akin to their own function call on the stack (despite not using Python's function call syntax), except it's a function call whose control flow arrows diverges into many such arrows in parallel, which nevertheless returns all arrows to a single point upon the exit of the with ... as nursery block. This is of course the entire purpose of structured concurrency, that all arrows have a start and an end -- and furthermore, nurseries have exactly one entry point and one exit point, no matter what the several arrows do inbetween. The nursery manages the transition from syncrhonous to concurrent and back to synchronous. (The "and back" part is where most other async libraries fail to conform to structured concurrency, as NJS writes -- dangling arrows, silently discarded exceptions etc.)

The trouble is that, as it currently stands, nurseries discard task-function return values, which leaves each of its internal arrow half-dangling. When the nursery exits we know where the control flow is, so the arrow is partially complete, however some of the function-interface data, some of the stack data, is silently discarded, breaking the "functions as black boxes" primitive implied by "structured". In effect, each arrow of the nursery is only half-completed, the other half discarded and dangling. Upon the exit of the with ... as nursery block, the nursery's data is gone, those task-function retvals never to be recovered. Loosely speaking, I can trace the flow of the instruction pointer, but not the data pointer. I have no way to communicate my results from inside the concurrent nursery block back to its containing synchronous scope, after all the parallel arrows have reconverged onto the single exit point of the nursery.

The use of a gather-like pattern, which are a dime a dozen on the web, with several examples here in this thread, is a hack by which one can fake their way into similar functionality. Nevertheless, it is not safe pattern, essentially by using a closure outside the with ... as nursery scope -- a closure crossing between synchronous and concurrent contexts!! -- to store those missing retvals for use after the with ... as nursery scope exits, after we reconverge from concurrent back to synchronous execution. Such closures break the "functions as black boxes" fundamental idea of structured concurreny, and is why they aren't thread safe, and generally considered bad practice, even in strictly-synchronous code. Closures break structure, and doubly so when such a closure crosses the synchronous-concurrent boundary as implemented by nurseries. In other words, much in the same way that "goto considered harmful" implies "go considered harmful", so too does "go considered harmful" imply "gather-like-recipes-doing-closures-across-nursery-scopes considered harmful".

All these gather recipes break the fundamental idea of structured concurrency. They harm the programmer's ability to abstract their code into components, which harms their ability to reason about their code and their ability to write it. Instead, since it's the nursery's job to manage the transition from one arrow to many arrows and back to one, it's also the nursery's job to pass data along both of those transitions. But, currently, the nursery passes data across one of those transitions, but not the other. (Specifically, nursery.start_soon accepts arguments which are forwarded to task-functions: here, the nursery does its job of passing data from the one single entry point into the many parallel arrows, but the nursery later discarding the retvals of the same functions means that no data ever passes from the parallel arrows back to the single exit point, breaking structured programming.)

TeamSpen210 commented 1 year ago

In regards to creating a function for each task, that works and is a quite powerful approach, but it does have the problem that these functions are inherently specialised and can't be reused. It also requires a full block syntax, so if you have a bunch of heterogeneous tasks it gets a bit awkward to make these every time.

outcome.Outcome is somewhat different to the other objects you've mentioned actually, because it's not based in concurrency. It's simply a container to hold either an exception result or return value, then unwrap that elsewhere. What I've been using personally in my own code is a similar sort of container to that as well as asyncer.soonify. The difference is that it won't let you access the result until the nursery is completed successfully, so the result must always be available.

dubslow commented 1 year ago

In regards to creating a function for each task, that works and is a quite powerful approach, but it does have the problem that these functions are inherently specialised and can't be reused. It also requires a full block syntax, so if you have a bunch of heterogeneous tasks it gets a bit awkward to make these every time.

I don't follow your meaning. Tasks are functions. Writing a task = writing a function.

outcome.Outcome is somewhat different to the other objects you've mentioned actually, because it's not based in concurrency. It's simply a container to hold either an exception result or return value, then unwrap that elsewhere.

ah... so basically a shorter spelling of a try: except Exception: clause?

What I've been using personally in my own code is a similar sort of container to that as well as asyncer.soonify. The difference is that it won't let you access the result until the nursery is completed successfully, so the result must always be available.

Ah yes, that sounds like what I say nurseries should already do -- exposing the results after the nursery is complete, after the nursery has returned us to a synchronous context. What does your code look like in this regard, inside the nursery? Presumably storing the return value of soonify or start_soon and then unpacking that after the nursery with block?

TeamSpen210 commented 1 year ago

The way outcome works is that you use the unwrap() method, which either returns a value when successful , or re-raises the exception if it failed. Trio uses them internally for tracking results a bunch. For my helper, it's just used like so:

async with trio.open_nursery() as nursery:
    # internally, nursery.start_soon(some_task, arg1, arg2)
    func_result = Result(nursery, some_task, arg1, arg2)
print(func_result() + 5)

It's also got a sync() classmethod to delegate to trio.to_thread.run_sync, since I use that a lot. Currently it's abusing the CancelScope repr() to be able to check if the nursery has completed successfully, I should open an issue for exposing that properly.

dubslow commented 1 year ago

Ah, very nice. That is indeed approximately the functionality that a nursery should implement by default. Ultimately I think it shouldn't be necessary to store some new variable from within the nursery, since what you have looks pretty close to a Future to me, but this is indeed fairly clean, at least from a user perspective.

Probably the smelliest thing about it is indeed the abusive way to track the nursery status. My "cleanest" hack stores the results on the nursery itself, which obviates that need, and removes the "storing a new variable inside the nursery bit", but then my hack still looks like several threads smashing a single global, which your faux-Future doesn't look like, instead leaving collation/serialization until after nursery exit.

Still would be preferable, and cleanest, to have this sort of semantics in nurseries from the start of course. We shouldn't have to add our own code, no matter the form, in order to get fully structured concurrency.

smurfix commented 1 year ago

I don't follow your meaning. Tasks are functions. Writing a task = writing a function.

A task is a function with some arguments that you start at a specific time. One function = any number of tasks (including zero). Thus, using the function as an index for the task's result doesn't always work.

what you have looks pretty close to a Future to me

Well, no. A Future is created, then fed with a result/exception some time later, then you wait on it, then you access the result or exception. An Outcome conflates the first two steps and cannot be waited on.

Yes a nursery could save the results of all those tasks. But how do you access them later? Sequentially? (By task start or by task stop?) By some key? Where and based on what do you create that key? Or simply by returning a Future object from start_soon? Is an exception a "result", and does it cancel the nursery, as it does now, or not? Do you need to be able to cancel the task? Is the task auto-cancelled when the object that's going to capture its result is no longer accessible, and if so what happens if it raises someting that's not a Cancelled exception?

The answer to these questions is highly application specific, as is the "OK I have five nurseries and exactly one of these needs to go through this dance while two others absolutely must not" problem.

It's rather easy to write a nursery wrapper that answers each of these questions in one specific way. Writing one that does all of the above (and believe me I've needed all of them during the last few years) might be possible, but then you need to decide each time which variant you want (including the don't-save-a-result-dammit version). Far easier, IMHO, to have a nursery that drops results on the floor, and write a local, small and obvious helper that saves your results to wherever when you need that.

dubslow commented 1 year ago

Yes a nursery could save the results of all those tasks. But how do you access them later? Sequentially? (By task start or by task stop?) By some key? Where and based on what do you create that key?

By default, a nursery should do the simplest thing, which is to simply sequentially store for them for access after the nursery exits. Anything fancier can be done by the end user encoding their retvals as they need. In other words, you're overthinking it. The nursery just needs to provide any retval at all, and the user can build atop that with no problem. Currently, there's nothing to build atop, which is the issue: violation of structural programming.

Or simply by returning a Future object from start_soon?

The whole point of this thread is that the return values are meaningless until the nursery has exited (reconverged from concurrent to synchronous). A format like this gives the wrong idea.

Is an exception a "result", and does it cancel the nursery, as it does now, or not? Do you need to be able to cancel the task? Is the task auto-cancelled when the object that's going to capture its result is no longer accessible, and if so what happens if it raises someting that's not a Cancelled exception?

Well frankly this these are orthogonal questions. As it is, users have no way of (easily) customizing cancellation of a nursery, so these questions are kind of moot. Ultimately, I don't see anything wrong with the simplest answer: the nursery stores any retvals it gets, and those are available to the user after the nursery has exited. Any tasks which were interrupted, which never returned, are therefore not in the results, and it's up to the user to handle exceptions, exactly as currently. The answer to these questions is completely independent from the nursery storing task retvals.

The answer to these questions is highly application specific, as is the "OK I have five nurseries and exactly one of these needs to go through this dance while two others absolutely must not" problem.

Nurseries are little customizable as-is, so your concern is already a concern, quite independent of the question of nurseries breaking structured data flow or not.

It's rather easy to write a nursery wrapper that answers each of these questions in one specific way.

"nursery wrapper" is exactly the sort of phrase that should clue you that something is wrong. Wrappers crossing the synchronous-concurrent boundary, i.e. crossing into and out of the with ... nursery: scope, are exactly the sort of anti-pattern which breaks structural programming. All I'm asking is that nurseries respect structural programming, as they're supposed to. Nothing else changes, it's no earth-shattering revelation.

Far easier, IMHO, to have a nursery that drops results on the floor, and write a local, small and obvious helper that saves your results to wherever when you need that.

There is nothing that is remotely obvious about any of the hacks listed in this thread. The whole point of this thread is that they are hacks, harmful anti-patterns which degrade programmer ability to abstract and reason about the code. Dropping stuff on the floor is the worst thing a nursery can do. Seriously, Structured Programming 101 right here.

smurfix commented 1 year ago

By default, a nursery should do the simplest thing, which is to simply sequentially store for them for access after the nursery exits.

Which sequence – task start or task exit? What about nurseries that run forever, do they contain a possibly-infinite array of None values?

Why build that into a nursery when you can add three lines of code to wherever you actually need it?

res = []
async def save(p,*a):  # store task results, task-exit order
    res.append(await p(*a))
async with open_nursery() as n:
    n.start_soon(save, fn, args…)

The whole point of this thread is that the return values are meaningless until the nursery has exited (reconverged from concurrent to synchronous). A format like this gives the wrong idea.

Well, my point is that sometimes you need the values as they are generated, sometimes you don't need the overhead of saving them, and sometimes saving them is actually harmful.

"default to saving results" thus causes your code to work quite happily when you test it, but to eat an unbounded amount of memory in production unless you add an explicit option telling it not to. Talking about anti-patterns, that's a fairly major one for me.

Well frankly this these are orthogonal questions

If you want an exception to be saved in an Outcome object for later perusal you need a function just like the aforementioned save, except that it calls outcome.acapture(p,*a). Ditto (more or less) for answering the other questions in that block.

Thus, as soon as you get around to implementing any of this, it's hardly orthogonal.

"nursery wrapper" is exactly the sort of phrase that should clue you that something is wrong.

Might I suggest that you re-evaluate what I wrote?

A nursery wrapper for me is simply a class that contains a nursery (as opposed to a nursery subclass, which Trio doesn't allow – for reasons I don't entirely agree with, but that's orthogonal to this topic) which passes everything through to the "real" nursery, except that it augments start_soon and start in order to add the aforementioned save method/function. If this method stores the result in a nursery_wrapper.results attribute you get exactly what you're advocating for.

I for one am perfectly happy to add that simple save, to those start_soon or start calls that actually need them. Explicit is better than implicit, simple is better than complex, and all that. IMHO.

dubslow commented 1 year ago

Which sequence – task start or task exit?

Considering that the former is nonsensical (saving a value that doesn't exist yet), the latter.

What about nurseries that run forever, do they contain a possibly-infinite array of None values?

That's up to the user to not launch an infinite number of tasks.

Why build that into a nursery when you can add three lines of code to wherever you actually need it?```

As I've repeatedly stated (but allow me to repeat again), this "pattern", which is in fact an anti-pattern, is extremely harmful. Your res variable crosses the synchronous-concurrent boundary which is the whole purpose of nurseries to enforce. In other words, this code snippet breaks the whole idea of nurseries. It ruins programmer ability to abstract and reason about the code, leading to bugs. Not surprisingly, it's also literally dangerous in the sense of not being threadsafe, being equivalent to many threads accessing a global variable at the same time.

(As in, if one of your tasks is a trio threading wrapper around an external synchronous function, any such trio thread will race with the native-trio tasks in the event loop, smashing the res variable. It looks simple, but is in fact very much complex in operation, which is part of what makes it so harmful.)

Well, my point is that sometimes you need the values as they are generated, sometimes you don't need the overhead of saving them, and sometimes saving them is actually harmful.

"default to saving results" thus causes your code to work quite happily when you test it, but to eat an unbounded amount of memory in production unless you add an explicit option telling it not to. Talking about anti-patterns, that's a fairly major one for me.

The entire purpose of a nursery is to enable many ~independent tasks in parallel. The user can spawn as many tasks as they like to consume as much memory as they like. Nurseries fully implementing structured programming doesn't change the fact that they already allow the user to do bad memory management. What it does change is the ability to avoid harmful global-like patterns such as the one you noted. Your other concerns already exist as-is.

If you want an exception to be saved in an Outcome object

Huh? Since when are exceptions special? Exceptions cancel the nursery currently, and that wouldn't change.

Might I suggest that you re-evaluate what I wrote? A nursery wrapper for me is simply a class that contains a nursery

Indeed, which is to say, it's not much different from your save and res pattern, abstractly speaking. It breaks the synch-concurrent boundary by "containing" a nursery and implementing new semantics both inside and outside that nursery. And by crossing that boundary becomes dangerous, harmful, breaking structured programming.

I for one am perfectly happy to add that simple save, to those start_soon or start calls that actually need them. Explicit is better than implicit, simple is better than complex

This save pattern is not simple, but is in fact complex, dangerous and harmful. The appearance of simplicity is part of its harmfulness. It is not simple at all! It's like a global variable, and carries all the dangers of global variables in a threaded context. Don't let the fact of "async event loops are single-threaded" confuse you into believing that this save pattern is somehow safe or good. It mixes concurrent and synchronous code in the worst possible way, in exactly the way that trio tries to prevent. This save pattern violates everything that trio is supposed to stand for. And by enabling users like you to think that this is okay, or even good, is why this gap in nurseries should be fixed ASAP, IMO. Fake-gathers considered harmful.

(As far as customizing nurseries, I tend to agree with you that it would be nice to customize certain aspects of them, most notably in terms of cancellation policy, but I don't know how cancellation works anyways so I can't speak to that. And this is off-topic anyways.)

smurfix commented 1 year ago

Which sequence – task start or task exit?

Considering that the former is nonsensical (saving a value that doesn't exist yet), the latter.

What's nonsensical about that? When I start task A and then task B, I know which order I started them in but not which order they finished. So I might want to access task A's result by accessing nursery.result[0].

Your res variable crosses the synchronous-concurrent boundary which is the whole purpose of nurseries to enforce.

Well, that's your opinion. Mine is that while control flow and data flow are of course causally related, they're otherwise independent concepts, and I see no good reason why a nursery's job should be to enforce any aspect of the latter.

dubslow commented 1 year ago

Well "enforce" is always a strong word, Python users can easily shoot themselves in the foot with any code if they try hard enough, but in this case I just think that nurseries should give dataflow the same support as controlflow. It should be possible for users to easily obey the rules of structured concurrency (even if Python doesn't really "enforce" rules properly).

But I appreciate that we can politely agree to disagree :+1:

tapple commented 9 months ago

I disagree with the entire premise of this issue: that nurseries are like a function call. I think they are not. I think they are more comparable to an if or for statement. an if does not have a return value. nor does a for loop. Nurseries are not a language feature at this time, so they are built out of something that is: functions. However, the functions that make up a nursery are an implementation detail, and nurseries are under no obligation to behave like functions and have a return value

Every language has a distinction between expressions and statements. expressions have a value once they are finished running; statements do not. function calls, literals, variables are expressions. if blocks, for blocks, with blocks, and nurseries are statements. (This is the distinction between the old a = b code and the new a := b code. The first is a statement, the second is an expression)

In one language I'm aware of (Smalltalk), "if" and "for" are expressions that have a return value. If trio were a Smalltalk library, I would agree with this issue. But, trio is a python library, where control-flow features are statements, not expressions (this is true of every language I'm aware of, except Smalltalk). Therefore, I think nurseries being statements in trio-on-python is consistent with the rest of python, and therefore, not a bug