ray-project / ray_beam_runner

Ray-based Apache Beam runner
Apache License 2.0
41 stars 12 forks source link

Support metrics aggregation in Ray runner #56

Closed valiantljk closed 1 year ago

valiantljk commented 1 year ago

Related Issue: https://github.com/ray-project/ray_beam_runner/issues/47

This PR implemented the metrics aggregation in the Ray runner. Supported metrics include:

  1. user-defined element-count metrics, e.g., counter, gauge, distribution (min, max, avg, etc)
  2. non-user metrics, e.g., start_bundle_msecs, total_msecs, etc

The PR also provides an example using user-defined metrics in word-count. To run the example:

python examples/word_count_metrics.py --input input.txt

With a random generated input.txt, the expected output should be like:

Ray Runner--------------->
number of empty lines:98
average word length:4.18
min word length: 1
max word length: 14
Direct Runner--------------->
number of empty lines:98
average word length:4.18
min word length: 1
max word length: 14
pabloem commented 1 year ago

this is really great. Thanks! : )