open-telemetry / opentelemetry-rust

The Rust OpenTelemetry implementation
https://opentelemetry.io
Apache License 2.0
1.88k stars 438 forks source link

Memory optimization for metric datapoints #1566

Open cijothomas opened 8 months ago

cijothomas commented 8 months ago

Metric datapoints currently store the starttime and endtime of the data point.

Since starttime and endtime will be same for each and every datapoint within a metric, it can be moved to the Sum struct itself, instead of repeating it for each datapoint. (Similar for Gauge/Histograms too).

cijothomas commented 8 months ago

This will be a breaking change for folks authoring metric exporters, and must be done before SDK stable release.

hdost commented 8 months ago

Is that the case ? Start time should be the first point in time something existed End time is usually when the datapoint was recorded.

And https://github.com/open-telemetry/opentelemetry-proto/blob/c5c8b28012583fda55b0cb16f73a820722171d49/opentelemetry/proto/metrics/v1/metrics.proto#L176

I would imagine starttime could be moved up, but that the time/endtime would need to stay under.

cijothomas commented 8 months ago

Is that the case ?

Yes. Every datapoint/metricpoint(time-series) in the same metric has same start/end time for a collection.

hdost commented 8 months ago

Is that the case ?

Yes. Every datapoint/metricpoint(time-series) in the same metric has same start/end time for a collection.

Start time isn't intended to be based on collection time, but observations time. If a datapoint was first observed for series A at T1 then it should be T1 until the process dies. If the last observation occurs at T3 then start should be T1 and "endtime" or really just "time" should be T3.

// StartTimeUnixNano in general allows detecting when a sequence of // observations is unbroken. This field indicates to consumers the // start time for points with cumulative and delta // AggregationTemporality, and it should be included whenever possible // to support correct rate calculation. Although it may be omitted // when the start time is truly unknown, setting StartTimeUnixNano is // strongly encouraged.

From the proto

cijothomas commented 8 months ago

What does the proto tell? I am not able to follow your concern. Could you elaborate or send a unit test showing the issue?

hdost commented 8 months ago

Perhaps for sum its not an issue since we are just adding all structures together, but if in a collection for a histogram you need to aggregate latencies. The observation time matters.

cijothomas commented 8 months ago

Yes I can see your point, but the current implementation in this repo does not support that - it uses same start and end for all datapoints. Also, I don't think we ever record the precise time at which the measurement was reported (too much perf cost) - we only say it is between the start and end time. It is possible to go more precise, at the cost of more perf.

The data model has some pictures (Histograms and Sums), which is aligned with my proposal - the actual recordings occur somewhere between t1 and t2. https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/data-model.md#histogram

Anyway, I'll send a PR with the change.

lalitb commented 8 months ago

Just to be clear, start-time could be moved up, and end-time/aggregation-time would remain in data point - is it so?

https://github.com/open-telemetry/opentelemetry-proto/blob/9d139c87b52669a3e2825b835dd828b57a455a55/opentelemetry/proto/metrics/v1/metrics.proto#L156C1-L157C17

// Data points with the 0 value for TimeUnixNano SHOULD be rejected // by consumers.