ray-project / ray_beam_runner

Ray-based Apache Beam runner
Apache License 2.0
41 stars 12 forks source link

Design a path for work items to report progress #55

Open pabloem opened 1 year ago

pabloem commented 1 year ago

The Beam API implements several RPCs for processing of data.

The main RPC is the ProcessBundle RPC, which we've implemented.

During the execution of a ProcessBundle rpc, the Beam worker consumes data to process, and outputs data. While this request is being executed, the runner can request progress reports from the worker. These progress reports are requested via the ProcessBundleProgress rpc.

This work item consists on:

Some ideas:

This is just one idea, but we can do things in different ways.

pabloem commented 1 year ago

@valiantljk this is something that you could consider taking a stab with? : )

you'd have to add some smart code so that ongoing ray_execute_bundle tasks can report progress - a bit of Beam code to assemble the ProcessBundleProgressRequest protos, and some Ray code to create actors/threads to report results

pabloem commented 1 year ago

lmk if that helps

iasoon commented 1 year ago

For how to do the ProcessBundleProgressRequests, you can check the original FnApiRunner code, I think there's a basic implementation there.