riker-rs / riker

Easily build efficient, highly concurrent and resilient applications. An Actor Framework for Rust.
https://riker.rs
MIT License
1.02k stars 69 forks source link

Receive functions that return results #125

Open olanod opened 4 years ago

olanod commented 4 years ago

This is also more of a question since I'm still figuring things out and wanted to know more about the error handling practices of Riker. Wouldn't it be better if receive functions can return a Result? e.g. an anyhow::Result or Riker special result with errors that give hints of severity to know if the message should be retried, the actor restarted or just go to the dead letters queue? I know there is a preference in this pattern for letting things fail but on the ergonomics side it's like writing old rust with tons of unwrap/expect and I'm still to early in the game to even know how to handle those panics later :sweat_smile:

igalic commented 4 years ago

the main problem with panics in rust is that they crash the entire programme, instead of just a single thread. those are vastly different ergonomics between Erlang and Rust, so we'll have to find the right / correct, and "best" way ourselves

filmor commented 4 years ago

That's not true. Panics are (by default) contained within their thread. Within the same thread they can be isolated using catch_unwind. That behaviour can be overridden at compile-time to abort instead, but the default is unwinding.

leenozara commented 4 years ago

Panics are handled in-thread and specifically with regards to Riker within the executor. If an actor panics the actor's mailbox is suspended and the parent supervisor determines how to handle the panic, as per the supervision strategy defined. The panic is isolated and no errors are leaked. If the supervisor decides to restart the child then a new actor of the same type, at the same path is started. The restarted actor follows the same life cycle as any new actor, including fresh state. The message that caused the actor to panic and any other values that the actor owned are removed from memory. Remaining messages in the mailbox will continue to be processed.

Technically it would be possible to return a future of a result that could be extracted. However, this breaks the actor model. It would mean that, in addition to an actor acting upon its expected message types at a mailbox level, it would also be acting upon out-of-bounds messages, in this case a Future<Result>>. Anything that might change an actor's state or behavior should be done through actor messages.

As a concrete example, if you are modeling an ecommerce payments flow with actors, you'd having a hierarchy like:

riker
└─ payment-manager
   └─ card
   └─ crypto
   └─ storecredit

If the customer is paying with card the manager (e.g. PaymentManager) forward the payment message to the Card actor, which in turn will create a child actor to handle that specific payment (e.g. StripeProcessor) and forward the same message to that child. The child will probably be initiating a network request, which returns a Future<Result>>. If the IO result is an error, a message (e.g. PaymentFailure<Reason>>) can be sent back to Card actor. The actor could decide to use an alternative card processor. If the IO result was a success a PaymentResult<Transaction>> can be sent back to the original sender, PaymentManager. This way, all state changes and behavior are isolated to message handling on recv.

This is the same reason why the ask pattern isn't advised inside an actor. It is advised to use ask to extract values from actor system to outside.

igalic commented 4 years ago

this reads like something that should be, almost verbatim, in our documentation

hardliner66 commented 4 years ago

It also should be documented that the supervision doesn't work if compiled with panic = abort, because catch_unwind only works if panics unwind.

Edit: It seems that the method of using catch_unwind is derived from the usage of exceptions in other languages. But a panic is not an exception. Maybe we should also look into other ways to handle errors to better match rust best practices.

filmor commented 4 years ago

There is not really another way apart from letting the respective scheduler thread die and starting a new one. catch_unwind stems from the need to not unwind into foreign code, for example if you have a callback function of a C library that panics. It's an isolation primitive, not a "classical" exception handling mechanism.

igalic commented 4 years ago

If the IO result was a success a PaymentResult<Transaction>> can be sent back to the original sender, PaymentManager. This way, all state changes and behavior are isolated to message handling on recv.

in all the examples I've seen so far, Sender has been set to None so in that sense, most examples seem to be fire and forget, we haven't seen any talk back

i feel like this could also help model #103.