open-telemetry / opentelemetry-network

eBPF Collector
https://opentelemetry.io
Apache License 2.0
296 stars 46 forks source link

[RFC] Replacing reducer with stream-processing framework Arroyo, POC #263

Open yonch opened 7 months ago

yonch commented 7 months ago

Describe the issue you're reporting

The reducer processes collector telemetry to time series. In recent years, stream processing frameworks have emerged that can perform a similar task, with lower implementation complexity and standardized interfaces. Using a stream processing framework for the reducer should reduce the maintenance load and help more easily evolve the project.

One promising stream processing framework is Arroyo, which seems dual-licensed MIT/Apache2.0. Arroyo takes in pipeline definitions in SQL, and configures streaming workers to materialize the results. @mwylde gave a talk at Data Council'24: YouTube link and after discussing with him and @jacksonrnewhouse, it seems like it could be a good fit for the reducer.

How could such an implementation look?

To get more insight of how good a fit this is, and the amount of effort required, we should build an initial proof-of-concept using raw telemetry dumps from collectors to reducer, to Arroyo and demonstrate joins and aggregations. A minimal demo might stream 30 seconds of telemetry from a single kernel collector, and show output of total throughput by container.

Does this require a lot of development? It seems we have most of the tooling to make this work. The collectors already have the ability to dump the binary stream that would go to the reducer ("intake" data) to a file, see IntakeConfig and kernel_collector_test (config and instantiation), and it seems possible to control dumps via an environment variable. We have existing tooling to translate the binary data to JSON.

Arroyo accepts several source connectors. The POC might stream JSON to Fluvio(Apache-2.0) which Arroyo accepts. The Fluvio CLI accepts records from files or stdin, so while this adds another components to the POC, it appears it requires no code modifications, just configuration.

How will this look eventually? There are more steps to get there, but to provide a general idea if this isn't clear from above:

cc @open-telemetry/network-maintainers @open-telemetry/network-approvers @open-telemetry/network-triagers