ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.39k stars 5.66k forks source link

[core] Support detached/GCS owned objects #12635

Open ericl opened 3 years ago

ericl commented 3 years ago

Overview

In the current ownership model, all created objects are owned by some worker executing the job. This means however that when a job exits, all created objects become unavailable.

In certain cases it is desirable to share objects between jobs (e.g., shared cache), without creating objects explicitly from a detached actor.

We have a current internal API to allow the owner of an object created with ray.put() to be assigned to a specific actor, e.g.:

ray.put(data, _owner=actor_handle)

This means the object created will fate-share with that actor instead of the current worker process. This means that it's currently possible to create global objects by creating a named detached actor, and setting the owner to that actor.

However, it would be nice to support ray.put(data, _owner="global") to avoid the need for that hack, and allow the object to be truly HA (e.g., tracked durably in HA GCS storage).

jovany-wang commented 2 years ago

However, it would be nice to support ray.put(data, _owner="global") to avoid the need for that hack, and allow the object to be truly HA (e.g., tracked durably in HA GCS storage).

You mean the data is put into the backend storage? When and how do we clear it? by TTL or explicit API?

ericl commented 2 years ago

Only the metadata ownership would be handled by the GCS--- everything else including ref counting remains the same.