ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.71k stars 5.73k forks source link

Fault tolerance to dead actors #6670

Open sebastiangonsal opened 4 years ago

sebastiangonsal commented 4 years ago

Fault tolerance to dead actors

The documentation on actors fault tolerance is not clear whether the actors a re-created: https://ray.readthedocs.io/en/latest/fault-tolerance.html

The line mentions: by default any attempt to get an object from that actor that cannot be created will raise an exception

Does that mean, if i have a line like the following:

remote_actor = RemoteActor()
futures = [remote_actor.do.remote(i) for i in range(10)]
ray.get(futures)

And for i=5, it generates an exception resulting in actor process killed. Then for i=6, the actor would be recreated automatically? (and how? what if the constructor has args)

sebastiangonsal commented 4 years ago

So I just ran a test, ray.get(futures) results in an exception in the main thread. Changing to ray.wait works fine (since ray.wait returns a list of available and non-available object ids, instead of trying to returning all?).

I had a question about the remote actor. I had the following as my actor class:

@ray.remote
class RemoteActor:
    def __init__(self):
        pass

    def do(self, i):
        if i == 5:
            raise Exception
        print(i)

It printed all passed integers except for i=5 (that causes an exception). How did this happen? Did it restart the remote actor? How did it figure what args to pass to the constructor?

sebastiangonsal commented 4 years ago

Just ran more tests.

So it seems the fault tolerance, is just about ray catching exceptions thrown by a remote actor? (I had a local var in the actor class, and the value remained unchanged after the exception.)

ashione commented 4 years ago

@sebastiangonsal Actor will be reconstructed with first args if it fail to finish task and throw exception.