Open sebastiangonsal opened 4 years ago
So I just ran a test, ray.get(futures)
results in an exception in the main thread. Changing to ray.wait
works fine (since ray.wait
returns a list of available and non-available object ids, instead of trying to returning all?).
I had a question about the remote actor. I had the following as my actor class:
@ray.remote
class RemoteActor:
def __init__(self):
pass
def do(self, i):
if i == 5:
raise Exception
print(i)
It printed all passed integers except for i=5 (that causes an exception). How did this happen? Did it restart the remote actor? How did it figure what args to pass to the constructor?
Just ran more tests.
So it seems the fault tolerance, is just about ray catching exceptions thrown by a remote actor? (I had a local var in the actor class, and the value remained unchanged after the exception.)
@sebastiangonsal Actor will be reconstructed with first args if it fail to finish task and throw exception.
Fault tolerance to dead actors
The documentation on actors fault tolerance is not clear whether the actors a re-created: https://ray.readthedocs.io/en/latest/fault-tolerance.html
The line mentions: by default any attempt to get an object from that actor that cannot be created will raise an exception
Does that mean, if i have a line like the following:
And for i=5, it generates an exception resulting in actor process killed. Then for i=6, the actor would be recreated automatically? (and how? what if the constructor has args)