ray-project / ray_beam_runner

Ray-based Apache Beam runner
Apache License 2.0
41 stars 12 forks source link

[batch] Create Pipeline State Manager #4

Closed pdames closed 1 year ago

pdames commented 2 years ago

The Ray Pipeline State Manager is a central service that consolidates the execution state of all scheduled pipeline work items in Ray's object store. This should be based on the single-process current implementation used in Beam's FnApiRunner.

At a high-level, this should be a Ray Actor that worker tasks use for (1) durable persistence of any ObjectRef that they have persisted in Ray's object store via ref = ray.put(obj) and (2) on-demand retrieval of any persisted ObjectRef which they can materialize via obj = ray.get(ref).

The state manager should also support efficient, atomic checkpointing and restoration of all state persisted in Ray's in-memory object store to durable storage (e.g. on-disk or to a durable cloud storage service etc.).

pdames commented 2 years ago

This work is required as part of https://github.com/ray-project/ray_beam_runner/issues/2

ericl commented 2 years ago

The state manager should also support efficient, atomic checkpointing and restoration of all state persisted in Ray's in-memory object store to durable storage (e.g. on-disk or to a durable cloud storage service etc.).

This is interesting. We might have to do some work on improving the semantics of ObjectRefs serialization (in particular the interaction with ref-counting), since right now they're pinned forever in memory if exported. Hence, checkpointing may cause these objects to be leaked in the object store. cc @jjyao

pdames commented 2 years ago

@pabloem prototype: https://github.com/ray-project/ray_beam_runner/pull/6

pabloem commented 1 year ago

@iasoon implemented something like this