ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.89k stars 5.76k forks source link

[Core] on_evicted callback #45468

Open Atry opened 5 months ago

Atry commented 5 months ago

Description

A Ray API to register a callback triggered when a Ray object is evicted.

Use case

Suppose I have a Ray actor that can create a Ray object that associates with some non-serializable states. In the following example, the non-serializable state is a temporary directory.

class MyObject:
    pass

_non_serializable_states = {}

@ray.remote
class MyCreatorActor:
    def create(self):
        ref = ray.put(MyObject())
        _non_serializable_states[ref] = tempfile.mkdtemp()
        return ref

I want the following on_evicted function to be triggered on the creator actor's worker process when ref is evicted, i.e., when the Ray distributed reference counting for the Ray object is zero.

def on_evicted(ref):
    shutil.rmtree(_non_serializable_states[ref])
    del _non_serializable_states[ref]

How can I modify MyCreatorActor.create to register on_evicted?

Is there a Ray API similar to weakref.ref(obj, callback) to register such a callback?

Atry commented 5 months ago

There is a question on StackOverflow about this issue: https://stackoverflow.com/questions/78421032/how-to-properly-clean-up-non-serializable-states-associated-with-a-ray-object

jjyao commented 5 months ago

@Atry We currently don't have it. I think you can do the application layer reference counting for your use case.

Atry commented 5 months ago

I think it is not possible to reliably do the application layer reference counting.

jjyao commented 5 months ago

Yes but if your application is simple, you can do application layer reference counting as a workaround for now to unblock you.