open-telemetry / opentelemetry-specification

Specifications for OpenTelemetry
https://opentelemetry.io
Apache License 2.0
3.71k stars 887 forks source link

Decide on requirement levels for all JVM metrics and their attributes #3416

Closed trask closed 1 year ago

trask commented 1 year ago

3413 marks all the metrics as recommended initially, but we should decide if some of them are required for a compliant implementation, and if some of them should be only opt-in.

It also marks all the metric attributes as recommended initially, but we should similarly decide if some of them are required for a compliant implementation, and if some of them should be only opt-in.

jack-berg commented 1 year ago

decide if some of them are required for a compliant implementation

I'd like to see each metric have exactly one implementation - either the JMX or JFR based source depending on what's convenient. (Would graal need a separate source for these? I remember @ roberttoyonaga mentioning that certain JFR events are available and there are is maybe some JMX support coming?) If this is the case then all attributes should be required.

Maybe you could make the case that certain attributes like gc and action from metric.process.runtime.jvm.gc.duration could be opt-in, but all the attributes we've added so far in for the JVM are low cardinality so making attributes opt-in arguably needlessly complicates things.

roberttoyonaga commented 1 year ago

Would graal need a separate source for these

I'm not sure the actual sources would be different. I think it would still have to be JMX/JFR. It's just that some of the PlatformMXBeans are not fully implemented in SubstrateVM and not all the JFR VM inspection events built into Hotspot are implemented in SubstrateVM either. So if all the metrics/attributes are required, Graal native image would not meet the specification yet (which I guess might not be a big deal).

I'd like to see each metric have exactly one implementation

What about the existing JMX metric gatherers I was re-implementing with JFR a while ago? Should we keep them so in the future we have the option to transition to an entirely JFR implementation, or would you prefer to remove them for simplicity/easier maintenance?

jack-berg commented 1 year ago

What about the existing JMX metric gatherers I was re-implementing with JFR a while ago? Should we keep them so in the future we have the option to transition to an entirely JFR implementation, or would you prefer to remove them for simplicity/easier maintenance?

Even if it were possible to have an entirely JFR implementation, its hard to imagine why that would be preferred to JMX for the metrics we've already established. The JMX implementations are so simple and efficient that as long as the data is available, we should use it. Maybe future java implementations stop implementing the JMX platform beans, but we should cross that bridge when we come to it.

trask commented 1 year ago

I believe we have agreed that all metrics should be recommended, except for any metrics not under process.runtime.jvm.*, since those could also be emitted by the OpenTelemetry Collector leading to double aggregation across e.g. a pod/cluster

And we have agreed that all attributes should be recommended.

TODO: send PR to mark process.runtime.jvm.system.cpu.utilization and process.runtime.jvm.system.cpu.load_1m as opt-in

zeitlinger commented 1 year ago

Not sure I'm doing the right thing..

  - id: metric.process.runtime.jvm.system.cpu.utilization
    type: metric
    metric_name: process.runtime.jvm.system.cpu.utilization
    requirement_level: opt_in
docker run --rm -v /home/gregor/source/otel-semantic-conventions/semantic_conventions:/source -v /home/gregor/source/otel-semantic-conventions/specification:/spec \
        otel/semconvgen:0.18.0 -f /source markdown -md /spec
Error parsing /source/metrics/process-runtime-jvm-metrics.yaml

Invalid keys: ['requirement_level'] - @140:5
zeitlinger commented 1 year ago

recommended .. except for any metrics not under process.runtime.jvm.*

system.cpu.utilization is one of those - but it's not part of https://github.com/open-telemetry/semantic-conventions/blob/main/semantic_conventions/metrics/process-runtime-jvm-metrics.yaml - am I missing something?

trask commented 1 year ago

Invalid keys: ['requirement_level'] - @140:5

oh ya, requirement_level is not supported in the metrics yaml (yet).

for now we just have to update the markdown directly, e.g. here:

https://github.com/open-telemetry/semantic-conventions/blob/eacb63da1139c8802448614c040d40cd321d6602/specification/metrics/semantic_conventions/runtime-environment-metrics.md?plain=1#L317

trask commented 1 year ago

recommended .. except for any metrics not under process.runtime.jvm.*

system.cpu.utilization is one of those - but it's not part of https://github.com/open-telemetry/semantic-conventions/blob/main/semantic_conventions/metrics/process-runtime-jvm-metrics.yaml - am I missing something?

I don't think we need to do anything about system.cpu.utilization at this point. We'll need to tackle that post-stability in https://github.com/open-telemetry/semantic-conventions/issues/41.