open-telemetry / opentelemetry-java-instrumentation

OpenTelemetry auto-instrumentation and instrumentation libraries for Java
https://opentelemetry.io
Apache License 2.0
1.99k stars 868 forks source link

Invalid unit for process.runtime.jvm.system.cpu.load_1m #9492

Closed mviitane closed 10 months ago

mviitane commented 1 year ago

Describe the bug

I’m running OpenTelemetry demo application with the latest java agent 1.30.0.

The unit for process.runtime.jvm.system.cpu.load_1m seems wrong.

otel-col  | Metric #2
otel-col  | Descriptor:
otel-col  |      -> Name: process.runtime.jvm.system.cpu.load_1m
otel-col  |      -> Description: Average CPU load of the whole system for the last minute
otel-col  |      -> Unit: {run_queue_item}
otel-col  |      -> DataType: Gauge
otel-col  | NumberDataPoints #0
otel-col  | StartTimestamp: 2023-09-18 15:17:30.871407719 +0000 UTC
otel-col  | Timestamp: 2023-09-18 15:18:30.871396011 +0000 UTC
otel-col  | Value: 0.300293

Steps to reproduce

First update the java agent version for the demo app. https://github.com/open-telemetry/opentelemetry-demo/pull/1132

Configure a logging exporter for the Collector like this:

exporters:
  logging/detailed:
    verbosity: detailed

Start the demo app.

Check logs: $ docker compose logs otelcol | grep -A 8 -B 2 process.runtime.jvm.system.cpu.load_1m

Expected behavior

It should be like this: Unit: 1

Actual behavior

This is what I saw: Unit: {run_queue_item}

Javaagent or library instrumentation version

v1.30.0

Environment

Docker desktop on macOS

Additional context

No response

trask commented 1 year ago

hi @mviitane! this unit looks correct to me: https://github.com/open-telemetry/semantic-conventions/blob/875cfefe7d143aca5258c625c98d87d3c95aba46/model/metrics/jvm-metrics-experimental.yaml#L32

mviitane commented 1 year ago

OK, interesting. I was referring to this page with Unit: 1: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/semantic_conventions/runtime-environment-metrics.md#metric-processruntimejvmsystemcpuload_1m

mviitane commented 1 year ago

Hi @trask, I agree that the unit is following the new semantic convention. However, the metric name is not updated, and it follows the old semantic conventions. So is there still a mismatch between the metric name and the unit?

New semantic conventions: https://github.com/open-telemetry/semantic-conventions/blob/main/docs/jvm/jvm-metrics.md#metric-jvmsystemcpuload_1m

Old semantic conventions: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/semantic_conventions/runtime-environment-metrics.md#metric-processruntimejvmsystemcpuload_1m

mateuszrzeszutek commented 1 year ago

Hey @mviitane ,

We're planning to include all the changes in the JVM metrics semconv in the 2.0 release; that includes changing the names to the ones you can currently see in the semconv repo. We probably didn't think too much of it and merged the unit name change before that, sorry for the confusion. @trask should we revert that change? I don't think we're treating the unit name (not semantics!) change as breaking, the spec doesn't really say much about that though.

trask commented 1 year ago

@trask should we revert that change?

makes sense to me

mviitane commented 1 year ago

I'm fine either way.

breedx-splk commented 10 months ago

@mviitane Is it possible to try with the 2.0.0 release and let us know if the current values are now what you expect and are aligned with the semconv? Thank you.

mviitane commented 10 months ago

@mviitane Is it possible to try with the 2.0.0 release and let us know if the current values are now what you expect and are aligned with the semconv? Thank you.

@breedx-splk I'm trying to update adservice (in the OTel Demo App) to 2.0.0 release, and I get it building just fine, but it fails to export any signal and outputs:

ad-service  | [otel.javaagent 2024-01-30 12:10:57:257 +0000] [OkHttp http://otelcol:4317/...] ERROR io.opentelemetry.exporter.internal.http.HttpExporter - Failed to export logs. The request could not be executed. Full error message: Connection reset
ad-service  | java.net.SocketException: Connection reset
ad-service  |   at java.base/sun.nio.ch.NioSocketImpl.implRead(Unknown Source)
ad-service  |   at java.base/sun.nio.ch.NioSocketImpl.read(Unknown Source)
ad-service  |   at java.base/sun.nio.ch.NioSocketImpl$1.read(Unknown Source)
ad-service  |   at java.base/java.net.Socket$SocketInputStream.read(Unknown Source)
ad-service  |   at okio.InputStreamSource.read(JvmOkio.kt:93)
ad-service  |   at okio.AsyncTimeout$source$1.read(AsyncTimeout.kt:153)
ad-service  |   at okio.RealBufferedSource.indexOf(RealBufferedSource.kt:430)
ad-service  |   at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.kt:323)
ad-service  |   at okhttp3.internal.http1.HeadersReader.readLine(HeadersReader.kt:29)
ad-service  |   at okhttp3.internal.http1.Http1ExchangeCodec.readResponseHeaders(Http1ExchangeCodec.kt:180)
ad-service  |   at okhttp3.internal.connection.Exchange.readResponseHeaders(Exchange.kt:110)
ad-service  |   at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.kt:93)
ad-service  |   at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
ad-service  |   at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.kt:34)
ad-service  |   at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
ad-service  |   at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.kt:95)
ad-service  |   at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
ad-service  |   at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.kt:83)
ad-service  |   at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
ad-service  |   at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.kt:76)
ad-service  |   at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
ad-service  |   at okhttp3.internal.connection.RealCall.getResponseWithInterceptorChain$okhttp(RealCall.kt:201)
ad-service  |   at okhttp3.internal.connection.RealCall$AsyncCall.run(RealCall.kt:517)
ad-service  |   at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
ad-service  |   at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
ad-service  |   at java.base/java.lang.Thread.run(Unknown Source)

It looks like it tries to export http to port 4317. The Collector in the Demo App expects http to port 4318, and grpc to 4317.

The Java exporter should default to http 4318, right? https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md#otlphttp-default-port

mviitane commented 10 months ago

As I understand, the previous OTel Java exporter, by default, sent gRPC to 4317. The new version 2.0.0, by default, exports HTTP but uses the same port 4317.

trask commented 10 months ago

hi @mviitane! could this be coming from here?

https://github.com/open-telemetry/opentelemetry-demo/blob/022ba9090add200d768af7d7696a92ed6e71474f/.env#L15

mviitane commented 10 months ago

Thank you @trask! That was it. My mistake, I didn't see this. I'll fix the Otel Demo App.

@breedx-splk, I verified the jvm cpu metrics incl. units, and they were as expected. I'll close this issue.