open-telemetry / opentelemetry-java-instrumentation

OpenTelemetry auto-instrumentation and instrumentation libraries for Java
https://opentelemetry.io
Apache License 2.0
1.9k stars 829 forks source link

Unexpected aggregation temporality for process.runtime.jvm.gc.duration #7273

Closed PeterF778 closed 1 year ago

PeterF778 commented 1 year ago

Using java agent 1.20 and looking at the new histogram-based metric process.runtime.jvm.gc.duration.

It is reported as cumulative metric, which seems to be much less useful than it would be with aggregation temporality of delta. In particular, the min and max values are almost useless, as there's no way to figure out how long ago they had been encountered. There's also no way they can be correctly recovered by any metric post-processor when attempting to convert the metric to deltas.

trask commented 1 year ago

hey @PeterF778, can you use OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE=delta? (https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk_exporters/otlp.md#additional-configuration)

PeterF778 commented 1 year ago

I found the issue while looking at the output of LoggingMetricExporter, which does not seem to be configurable. Yes, using the setting you mentioned solves the issue for real use cases, but I still believe there's a problem.

With the new GC metrics, I have been looking forward to the maximum GC times per reporting interval. These times are really useful for the customers. They would be a poor-man solution to issue #6870. It would be great if these numbers were available with the default agent configuration.

Perhaps we need a way to provide hints about desired aggregation temporality on the metric level (an SDK extension). Or provide a default view configuration that would do that.

trask commented 1 year ago

I added this to Thu SIG agenda to discuss

fstab commented 1 year ago

I'm wondering if you could use Exemplars. Exemplars are a way to attach example observations to metrics. So if you have an interesting observation (like the longest GC pause over the past 5 minutes) you might want to attach it as an Exemplar to the histogram.

Exemplars are often used for linking to example traces. However, the concept is very generic. Exemplars can as well be used for scenarios that have nothing to do with tracing. The data model for Exemplars contains an observation (GC pause time) and an optional list of key/value pairs for additional info.

I'm not sure if the API of the Java metrics SDK supports custom Exemplars, but on the data model level this would be perfectly possible, and it would also be compatible with Prometheus.

PeterF778 commented 1 year ago

@fstab An interesting idea, but we do not have data for the longest GC pause over a recent interval. A histogram calculates this value automatically, there's no need for any additional code on the data collection side.

trask commented 1 year ago

Related to https://github.com/open-telemetry/semantic-conventions/issues/274

trask commented 1 year ago

@PeterF778 do you think your concern could be resolved by https://github.com/open-telemetry/semantic-conventions/issues/274? (we will be discussing potential bucket boundaries in tomorrow's SIG meeting)

PeterF778 commented 1 year ago

@PeterF778 do you think your concern could be resolved by open-telemetry/semantic-conventions#274? (we will be discussing potential bucket boundaries in tomorrow's SIG meeting)

Not really, I think these are separate issues.

trask commented 1 year ago

I don't think we can do anything about the default OTLP exporter temporality since that's part of stable spec. @PeterF778 is it ok to close this? thx

PeterF778 commented 1 year ago

Yes, if we cannot do anything about it, it makes no sense to keep it open.