To support Beam pipeline graph construction, optimization, and submission for execution, we need to build a Ray Runner class based on Beam Portability API's FnApiRunner that supports both local (i.e. single-node) and remote (i.e. multi-node) pipeline execution.
At a high-level, its run_pipeline method should take a Beam pipeline DAG to run as input, and produce a PipelineResult that can describe/manage pipeline execution state as output. The Ray Runner Beam Components Doc provides additional details about the FnApiRunner's end-to-end workflow, and how it relates to Ray.
This runner will also depend on successful implementation of the following components:
Ray Work Item Scheduler: The Ray Work Item Scheduler takes work items from the batch FnApiRunner's topological scheduler as input, and submits them for execution as Ray tasks.
Ray Pipeline State Manager: The Ray Pipeline State Manager is a central service that consolidates the execution state of all scheduled pipeline work items in Ray's object store.
To support Beam pipeline graph construction, optimization, and submission for execution, we need to build a Ray Runner class based on Beam Portability API's FnApiRunner that supports both local (i.e. single-node) and remote (i.e. multi-node) pipeline execution.
At a high-level, its
run_pipeline
method should take a Beam pipeline DAG to run as input, and produce aPipelineResult
that can describe/manage pipeline execution state as output. The Ray Runner Beam Components Doc provides additional details about the FnApiRunner's end-to-end workflow, and how it relates to Ray.This runner will also depend on successful implementation of the following components: