ray-project / ray_beam_runner

Ray-based Apache Beam runner
Apache License 2.0
42 stars 12 forks source link

Add support for timers in Portable Ray Runner #11

Closed pabloem closed 2 years ago

pabloem commented 2 years ago

The prototype outlined in https://github.com/ray-project/ray_beam_runner/pull/10 supports batch pipelines that read / write data.

For streaming workflows, the state and timer primitives exist, which are used to manage long-term persisted state, and managing event time / processing time in a streaming pipeline (see this blog post for more information).

In fact, windows and triggers (higher-level concepts) for Beam can be built using state and timers.

Timers are passed in a similar way that data is passed in Beam (serialized and sent to the runner by the worker), so we only need to process them and schedule them for execution.

Code pounters

pabloem commented 2 years ago

Useful literature: https://s.apache.org/beam-fn-api