Aggregate metrics within the main thread

When a Beam bundle is executed, a list of metrics is returned with the result. These metrics measure the occurrence of certain events. For example:

class MyDoFn(DoFn):
    def process(self, element):
        Metrics.counter('sample_dofn', 'sum_of_events').inc(element)
        Metrics.counter('sample_dofn', 'count_of_events').inc(1)
with beam.Pipeline() as p:
    (p 
     | beam.Create([1, 2, 3])
     | beam.ParDo(MyDoFn())

p.result.metrics()  # => Returns {('sample_dofn', 'sum_of_events'): 6, ('sample_dofn', 'count_of_events'): 3}]

The metrics are reported in the InstructionResult object that we recover after each bundle execution: https://github.com/ray-project/ray_beam_runner/blob/master/ray_beam_runner/portability/ray_fn_runner.py#L332

Here's the definition of the InstructionResult:

https://github.com/apache/beam/blob/master/model/fn-execution/src/main/proto/org/apache/beam/model/fn_execution/v1/beam_fn_api.proto#L120-L144

And specifically, this contains a ProcessBundleResponse, which has monitoring_data:

https://github.com/apache/beam/blob/master/model/fn-execution/src/main/proto/org/apache/beam/model/fn_execution/v1/beam_fn_api.proto#L352

So - what we want to do is take those metrics and aggregate them to have a unified view of them.

Here's an example of code doing that in the local runner. We may basically copy that code fully and run it on our runner as well:

https://github.com/apache/beam/blob/1427b7dd93ff2787d3798e7a558efbc13d460257/sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner.py#L445-L455

Probably around the spot where we recover the bundle result

Finally, another code sample that is less important, but worth knowing about is the part of the SDK worker that actually fills up these metrics: https://github.com/apache/beam/blob/0e61b026ea7accd666fc443f3aeec7f93147a3b6/sdks/python/apache_beam/runners/worker/sdk_worker.py#L635-L646

ray-project / ray_beam_runner

Aggregate metrics within the main thread #47