arthur-tacca commented 2 years ago

Update

I have now put together my idea of a task-like class into its own package: aioresult. It includes a ResultCapture class that runs a function and stores the result (like the Task class below) and also a Future class for when you want to manually set the value. It has functions for waiting (though normally just using a nursery would do); the wait_any() and wait_all() are slightly different from those in trio-util, but trio-util ones would also work if you passed ResultCapture.run() to them.

Original post

It would be great to have a Trio Task class, a bit like asyncio.Task or any number of other framework's task classes.

This could be a core part of Trio itself and I think I've seen it requested before, but I've realised it could actually be a fairly simple external class so could be ideal for trio-util.

The key idea is that, in this class, the task gets run in a user-supplied nursery. There are a couple of ways to do this but I think the most natural is forcing the user to run the task separately from creating it, with an async run() method that actually runs the task and you can just pass straight to Nursery.start_soon().

Here's a simple example that shows how you'd use this hypothetical Task object to run a few coroutines in a nursery, pretty similarly to usual, but then inspect their results later:

tasks = {url: Task(my_async_fetch_fn, url) for url in url_list}
async with trio.open_nursery() as nursery:
    for t in tasks.values():
        nursery.start_soon(t.run)
for url, t in tasks.items()
    print(f"At {url} got: {t.result}")

This would satisfy a few common requests made for Trio, most notably for an equivalent of asyncio's gather() function or for Nursery.start_soon() to give a way to get the return value of the async function. Actually, this is better than gather() because you're not forced to use a list, as you can see from the example above. It also nicely complements the wait_any() and wait_all() functions in trio-util (although it still would mainly be used with nurseries as in the above example).

Here's a really minimal implementation (it wouldn't handle enough cases for real use but does illustrate the idea):

class Task:
    def __init__(self, routine, *args):
        self.routine = routine
        self.args = args
        self.result = None
    async def run():
        self.result = await routine(*args)

Extra features

As I said, the above implementation is absolutely minimal. There are quite a few extra features I think could be useful:

Attributes exposed as properties t.result should be a read-only property rather than just directly exposing an attribute (and so should anything else that's part of the public API e.g. t.args).
Run twice check There should be a check (assert?) that run() isn't called twice for a given instance.
Record whether completed The task should note internally note whether it has completed (using try/finally in run())
- Then there could be a t.is_completed property (or method?)
- Accessing the t.result property should throw an exception if the task hasn't completed yet (TaskNotCompletedException?)
- You could wait for the task to complete with await t.wait_complete(). This could just work off of a trio.Event internally.
- One option is to combine the interfaces for getting the result and waiting for completion: result = await t.wait_completed(). This is most like the asyncio Future class. Then you'd probably still want a sync API t.result_nowait() or similar. But it seems cleaner to me to leave getting the result and waiting for completion separate, and leaving it up to the caller to compose them together if they wish (explicit is better than implicit).
Record finishing in exception If the task finishes in exception, that should be recorded too.
- If t.result is accessed after the task finishes with an exception, the obvious thing to do is raise an exception from the property access. But this should definitely be wrapped in an outer exception (the original exception was already raised somewhere, so it doesn't make sense to raise it again in its original form; also, if the task was cancelled, you don't want that trio.Cancelled raised somewhere else because it won't be in the correct nursery). Perhaps it could be called ExceptionWasRaisedByTaskException? Hmm, maybe not...
- Perhaps there should be a t.is_completed_with_exception property? Or perhaps best just to make the API for that to be that you access t.result and catch the exception.
kwargs You could allow passing *kwargs as well as just args. This solves another common request for Nursery.start_soon(). It does restrict what you could do with the interface to Task in future though (then again, a static factory method could be added if needed to workaround that).
Monitor whether started Allow using the Nusery.start() protocol, perhaps as a subclass of Task. Then users can wait for the task to start using await t.wait_started() allow access to start return value t.start_result.

Here a couple more for completeness, even though personally I don't like them:

Individual cancellation The task could have its own cancel scope and cancel() method (or just property access to cancel scope). I'm actually against this one though, as it adds overhead to every task and you could just cancel the nursery that you've put them in.
Start in constructor A slight variation on the interface would be to pass a nursery to the constructor, which would then call nursery.start_soon(self.run) itself. That would save a bit of code in the caller but personally I like the separation of the two steps (again, explicit is better than implicit), and it makes it obvious to that nursery.start_soon() is still the right way to start things.
Task runner (Suggested in one of the issues below) A separate helper class that wraps a nursery, but its start_soon() returns a task (instead of None). But I think just targeting the Task class at regular nurseries is much more useful (technically the task runner class doesn't prevent that, but it would confuse things IMO).

Relevant past issues

In trio-util:

7 as_completed() function is somewhat related, as it's also a way to get the result of multiple functions running concurrently.

In trio:

python-trio/trio#892 mentions tasks (although tasks seem to be a red herring for that poster's actual problem). In it, njsmith includes example code for ManagedTask which is not that different from my suggestion, and ManagedTaskRunner (mentioned above).
python-trio/trio#410 is about Nursery.start_soon() returning tasks – it says it actually did in early versions of Trio, but that was deliberately removed.
python-trio/trio#2188 is a direct request for an analogue of asyncio's gather() function.
python-trio/trio#1373 asks about returning a Trio Task object from Nursery.start_soon() (but Trio's own task class doesn't support much of this functionality anyway).
python-trio/trio#421 is about documenting how to collect results from tasks and python-trio/trio#472 is about that and other tutorial examples.
python-trio/trio#467 is partly about futures, which is a bit different from this (the result is set manually rather than captured from a coroutine) but has similar interface for fetching the result.

In other libraries:

trio-future is an implementation of this idea, with some differences to the interface.
dabeaz/curio#342 (and linked https://bugs.python.org/issue43736) complains that asyncio ought to require await when starting a task even if you don't actually want to wait for anything (and notes that Curio does require this sprurious await). That's not really related to this, but there's a side discussion that creating a task shouldn't immediately start it, which is true for this suggestion.
jeepney!11 includes a Trio Future class implementation (initial low-level implementation; later revision using Event).

Elsewhere:

The reason I wanted something like this in the first place was so I could wait for completion of a task being spawned in a different nursery (I don't even want the result!) – see this gitter thread.
Relevant StackOverflow questions: Capture the return value from nursery objects; How to gather task results in Trio?; Future/Promise like stuff for Trio in Python?

arthur-tacca commented 2 years ago

I've put together an implementation:

https://gist.github.com/arthur-tacca/32c9b5fa81294850cabc890f4a898a4e

I've renamed it ResultCapture based on feedback in Trio issues, which I think nicely stresses that it's about getting the coroutine result rather complex machinery for interdependent tasks.

Is there any interest here? Would it be worth me putting together a pull request?

Edit: Now in its own library: https://github.com/arthur-tacca/aioresult

belm0 commented 2 years ago

Hi-- sorry, I had mistakenly dropped taking a look at this from my TODO's.

Our application has pushed Trio fairly hard for 3 years (now 100k lines of code), and I haven't come across this kind of case enough to encapsulate it.

I was focusing on your original use case a little:

The reason I wanted something like this in the first place was so I could wait for completion of a task being spawned in a different nursery (I don't even want the result!) – see this gitter thread.

task = handler_nursery.start_soon(myhandler)
await task

I think it could be covered by just a utility function and the task_status prototcol:

async def done_wrapper(f, *args, *, task_status):
    event = trio.Event()
    task_status.started(event)
    try:
        await f(*args)
    finally:
        event.set()

done = await handler_nursery.start(done_wrapper, myhandler)
await done.wait()

arthur-tacca commented 2 years ago

Thanks for looking at this @belm0

I was focusing on your original use case a little: ...

You're totally right that in my original use case I don't need 90% of what I'm suggesting. A wrapper around the handler that sets an Event is all that's needed. As you can see from that gitter thread, currently I'm just doing that as part of the wider function that uses the handler, but it felt a bit messy to mix that up with its core logic. (In truth, there's so little code that doing any refactoring at all has debatable value.)

Thanks for your done_wrapper() idea, I like it a lot. It avoids dumping all my code into one function, while being laser focused on actually solving my problem, rather than coming up with some super general API. Using the task_status protocol is a really clever way of achieving it.

There's certainly some interest in a general task / result capture class (as all the links in my post show, and actually it came up again today on another Trio gitter thread). But it's clear you're not interested in it in your library, which totally fair enough, especially since there's a lot of debate about what the design would be (also clear from that thread). The gist I posted earlier exists if anyone wants to use it, and I might try to publish my code as a standalone package on PyPI (if I magically find some free time). So I'll close this issue.

groove-x / trio-util

Enhancement request: Task class #20

Extra features

Relevant past issues

7 `as_completed()` function is somewhat related, as it's also a way to get the result of multiple functions running concurrently.

groove-x / trio-util

Enhancement request: Task class #20

Extra features

Relevant past issues

7 as_completed() function is somewhat related, as it's also a way to get the result of multiple functions running concurrently.

7 `as_completed()` function is somewhat related, as it's also a way to get the result of multiple functions running concurrently.