Multiple concurrent async tasks on actors

Dessix commented 11 months ago

While the documentation around it is quite sparse, the current implementation notes that long-running asynchronous tasks "block" the actor's message handler, and that delays or similar should be translated to timers and message events.

For use as a sort of in-app microservice architecture, this makes it difficult to run long-running tasks as an actor, as something like sending them a graceful stop request could have the message handler trigger a cancellation token stored in the actor's state, but the message pump is not being polled as long as the actor's state is claimed.

Potential alternatives

futures-esque combinators atop timers to allow network-serializable poll-able events
- Wrap any future in a "wake-me and come back to this point"
- yield(myState)-like construct required in order to allow concurrency by temporarily returning state control to the runtime
- Simplifies "correct" usage of the current API, but doesn't follow Rust conventions on asynchrony
Externally accessible state stored in shared references to hold cancellation tokens
- Bypasses the messaging pump, weakening the communication model
Actor-level first-class support for graceful cancellation tokens
- Partially solves the microservice case
- Doesn't provide for inter-microservice communication over the message system, thus weakening the communication model

Additional context

The root of the issue is that concurrency is limited by mutable access to Actor::State. Bevy's scheduling approach of reader / writer isolation may also be of interest here, in that queries which read a resource may be concurrent, while queries which write to that resource must be run in isolation from each other.

jsgf commented 11 months ago

So is the main reason the factories don't work for you is cancellation?

slawlor commented 11 months ago

So a few things to note that might address what you've outlined here.

We have a cancellation token, of sorts, which is to kill() the actor. That will interrupt work at the next async await point, cancelling the downstream tasks. This however isn't a super clean pattern, if you have some custom initialization logic that you might need to clean up for example.
Similar to @jsgf 's point, this is sounding like a use-case for a factory of sorts. The factory itself isn't ever blocked on async work as it's just a glorified job scheduler, and the workers can be blocked for short or long periods. If the nature of your work is long, asynchronous tasks and many of them concurrently, it's a perfectly valid use-case to have 1k's, 10k's, or 100k's of workers to the factory. The only concern with a high worker count is if the message rate is also very high, the factory may start being unable to keep up with scheduling incoming messages and build a backlog.
A "mutex" actor which issues multiple read locks to shared state is also a valid use-case, and is something that's commonly built. However that's a building block of a higher nature than the runtime itself. Doing anything with shared state in an actor makes ownership quite difficult to reason over and Rust gives us a simple ownership model with mutable borrows guaranteeing that no one else has a reference, so we're free to mutate how we see fit.

If you just want a long-running task actor, which still has a higher interactive control on it, you have basically 3 options.

Factories, as stated above, but are overkill for just a single task
The supervisor model, where the supervisor doesn't execute any operation directly, but can manage the child "worker" (so a factory of concurrency level = 1) and can directly reply to coordination messages (stop/start/restart/etc). It can interrupt the child via the kill() primitive, and capture the death via supervision callback, then do whatever it wants to restart/exit/etc.
Just spawn a long-running task and keep the join handle in the actor's state, freeing the message pump. This requires more manual management for things like panics and whatnot, but for simple use-cases is probably the fastest and safe enough route. The actor can respond to messages and simple abort the task when it wants.

slawlor commented 11 months ago

closing as answered, feel free to re-open if you have additional concerns

slawlor / ractor

Multiple concurrent async tasks on actors #133

Suggested solution

Potential alternatives

Additional context