ray-project / ray_beam_runner

Ray-based Apache Beam runner
Apache License 2.0
42 stars 12 forks source link

Start designing shuffling algorithm #26

Open pabloem opened 2 years ago

pabloem commented 2 years ago

When a stage sends its output, we want to start using that to shuffle data to downstream stages.

https://github.com/ray-project/ray_beam_runner/blob/86bfcdd5705b2e689d0aff0f02b6cf46535c88d0/ray_beam_runner/portability/execution.py#L94-L108

Example of shuffle implementation for Ray Datasets 1.13: https://github.com/ray-project/ray/pull/23758