open-telemetry / opentelemetry-rust-contrib

OpenTelemetry Contrib Packages for Rust
Apache License 2.0
41 stars 37 forks source link

Etw-Metrics - Improvements required #104

Open cijothomas opened 2 months ago

cijothomas commented 2 months ago

Tracking item for improvements to the Etw-Metrics exporter: https://github.com/open-telemetry/opentelemetry-rust-contrib/tree/main/opentelemetry-etw-metrics

  1. Avoid heap allocation. It currently uses a vec! for each export. We could re-use the vec!, or use array (allocate maximum etw size - 65360).
  2. It currently serialized the entire batch of metrics and sends them as one etw event. This makes it very easy to hit the 65360 limit on ETW. We need to send one metric at a time to ETW. This is how OTel .NET exports metrics.. This should significantly reduce the probability of hitting ETW size limits.
  3. Unit testing - it should be possible to write unit-tests where an ETW listener is setup and it can validate the metrics are flowing as expected.
cijothomas commented 2 months ago

@mattbodd @psandana Either of you have bandwidth to help with the above, please reply to this issue. Thanks in advance!

mattbodd commented 2 months ago

@cijothomas I'd be interested in taking this on!

cijothomas commented 2 months ago

Thanks @mattbodd 2 is higher priority as users are hitting the limit very easily currently. 1 is more like nice-to-have, so lower priority than 2.

cijothomas commented 2 months ago

@utpilla has corrected me that OTel .NET exports every metric point in one etw call, not one Metric. So the same can be followed here too, as that'd make it very safe (from hitting etw limits), even when future improvements like Exemplars come into picture!

mattbodd commented 2 months ago

Thanks @cijothomas for calling out priorities and sharing how the .NET implementation approaches this issue

cijothomas commented 1 month ago

@utpilla has corrected me that OTel .NET exports every metric point in one etw call, not one Metric. So the same can be followed here too, as that'd make it very safe (from hitting etw limits), even when future improvements like Exemplars come into picture!

The above need to be revisited as too many calls to ETW might overwhelm the receiver. So we need more smarter splitting strategy. Following has some ideas worth exploring. https://github.com/open-telemetry/opentelemetry-dotnet-contrib/pull/1161/files#diff-6f70559904f5f6d784f8c37f3bd633ce651418dd52944157aee5269071c66e8f