slawlor / ractor

Rust actor framework
MIT License
1.51k stars 72 forks source link

`post_stop` of children are being called when supervisor fails. #226

Closed gusinacio closed 6 months ago

gusinacio commented 7 months ago

Describe the bug Imagine a supervision tree with A -> B. If A panics its post_stop is not called as expected but ractor terminates all its children via normal stop with reason "Killed" and then post_stop is called for this situation.

The problem is that there's no way to identify on B's post_stop that it's being terminated because its supervisor was killed.

The documentation clearly says about post_stop:

Invoked after an actor has been stopped to perform final cleanup. In the event the actor is terminated with Signal::Kill or has self-> panicked, post_stop won't be called. Panics in post_stop follow the supervision strategy.

For me, the actor should not trigger post_stop because it's a killed event or post_stop should have a parameter to identify the reason.

To Reproduce Steps to reproduce the behavior:

  1. Spawn an Actor A with post_stop
  2. Spawn linked Actor B with post_stop
  3. Send a message to actor A that panics/return an error.
  4. See the post_stop from B being called.

Expected behavior When Actor A panics, it should not trigger post_stop of Actor B.

Additional context Ractor Version: 0.9.7 Rust Version: 1.76.0

slawlor commented 6 months ago

Yes this shouldn't happen, I'll investigate and work on a fix. Thank you for reporting!

contrun commented 3 weeks ago

Should we run post_stop of the child when the parent was stopped on request (normal operation instead of panics)? It was my understanding that when the actor is stopped, we can use the post_stop function to reclaim the resources, and so I used a sort of "root" actor to keep track of all the actors and did some clean up work in the child actor's post_stop function. Now, below code will print "parent stopped" and "child stopped" before 0.9.8. Now it only prints "parent stopped". It was surprising to me, as I thought supervisor tree will make the management sub-tasks life cycles easier. Without automatic resource management, the usefulness of supervisor tree would be greatly reduced.

#!/usr/bin/env -S cargo +nightly-2024-10-01 -Zscript
---
[package]
name = "ractor-test"
version = "0.1.0"
edition = "2021"

[dependencies]
ractor = { version = "=0.9.7" }
tokio = { version = "1.40.0", features = ["full"] }
---

use ractor::{async_trait, Actor, ActorProcessingErr, ActorRef};

struct MyActor {
    name: String,
}

#[async_trait]
impl Actor for MyActor {
    type Msg = ();

    type State = ();

    type Arguments = ();

    async fn pre_start(
        &self,
        _myself: ActorRef<Self::Msg>,
        _: (),
    ) -> Result<Self::State, ActorProcessingErr> {
        Ok(())
    }

    async fn handle(
        &self,
        _myself: ActorRef<Self::Msg>,
        _message: Self::Msg,
        _state: &mut Self::State,
    ) -> Result<(), ActorProcessingErr> {
        Ok(())
    }

    async fn post_stop(
        &self,
        _myself: ActorRef<Self::Msg>,
        _state: &mut Self::State,
    ) -> Result<(), ActorProcessingErr> {
        println!("{} stopped", &self.name);
        Ok(())
    }
}

#[tokio::main]
async fn main() {
    let parent_actor = MyActor {
        name: "parent".to_string(),
    };
    let (parent_actor_ref, parent_actor_handle) =
        Actor::spawn(Some("parent".to_string()), parent_actor, ())
            .await
            .unwrap();
    let child_actor = MyActor {
        name: "child".to_string(),
    };
    let (_child_actor_ref, child_actor_handle) = Actor::spawn_linked(
        Some("child".to_string()),
        child_actor,
        (),
        parent_actor_ref.get_cell(),
    )
    .await
    .unwrap();
    parent_actor_ref.stop(Some("stop".to_string()));
    parent_actor_handle.await.unwrap();
    child_actor_handle.await.unwrap();
}
slawlor commented 3 weeks ago

So this is actually somewhat by design. If you want graceful stop of children, the supervisor should call stop to the children and block on their exit from its own post_stop routine (like the factory does). You can grab the ActorCell from the supervision tree in the actor's properties directly if you want (I think).

Otherwise it's going to use kill() to cleanup child actors, and propogate exits down the tree. There was a bug that was fixed (probably in 0.9.8) where kill() was calling post-stop, but it shouldn't, as it's meant to immediately halt execution of the actor (i.e. fast exit, interrupting async work). This is the default exit flow of supervision.

contrun commented 3 weeks ago

Yeah. So basically I need to run child_actor.stop(None) in my "root" actor, right? I think I have to save all the references to my children actors in the root actor, so as to stop them gracefully. Do we have any API for me to easily iterate over all my children and kill them one by one (LOL, that's brutal)?

slawlor commented 3 weeks ago

The information is readily available in the supervision tree, so if it's not accessible today, it would be trivial to add a stop_children and perhaps stop_children_and_wait to the ActorCell. That way you don't have to keep handles around, etc.

If you don't mind, open a new issue to track it?