A performance test for the Ray Beam portable runner

ray-project / ray_beam_runner

Ray-based Apache Beam runner

Apache License 2.0

42 stars 12 forks source link

A performance test for the Ray Beam portable runner #21

Open pabloem opened 2 years ago

pabloem commented 2 years ago

It would be great to have a micro benchmark, and a larger benchmark to measure our progress.

wilsonwang371 commented 2 years ago

I think I can spend some time working on this. This can help me get familiar with our current code and running environment.

pabloem commented 2 years ago

that would be great! We can track performance using this action: https://github.com/benchmark-action/github-action-benchmark

I don't think it needs to be very big. I think if it processes 1GB running locally with the current implementation, we may be able to get something that we can track and improve over time.

pabloem commented 2 years ago

We have a few microbenchmarks in Beam that you could use as inspiration, but I don't think they're big enough to test our runner and optimize over time:

https://github.com/apache/beam/blob/master/sdks/python/apache_beam/tools/fn_api_runner_microbenchmark.py (e.g. instead of creating 1000 elements, we could add a source that outputs more data - ~1gb