open-telemetry / opentelemetry-demo

This repository contains the OpenTelemetry Astronomy Shop, a microservice-based distributed system intended to illustrate the implementation of OpenTelemetry in a near real-world environment.
https://opentelemetry.io/docs/demo/
Apache License 2.0
1.74k stars 1.1k forks source link

`otel-col` error gathering metrics #969

Closed julianocosta89 closed 8 months ago

julianocosta89 commented 1 year ago

Bug Report

Which version of the demo you are using? 51dcd8b4f7e3e915dd3a4436715f7c4146766f25.

Symptom

A clear and concise description of what the bug is.

Demo is up and running, but collector is raising an issue every now and then.

What is the expected behavior?

I dosn't expect to see any error.

What is the actual behavior?

otel-col is logging the following:

otel-col                 | 2023-07-10T12:26:37.747Z error   prometheusexporter@v0.76.3/log.go:34    error gathering metrics: collected metric rpc_server_duration_milliseconds label:<name:"container_id" value:"11a8310341a98ddfcd0698f3d96405a4d7b5a7ad48b5a12718d9a93013e1aec4" > label:<name:"host_arch" value:"amd64" > label:<name:"host_name" value:"11a8310341a9" > label:<name:"job" value:"opentelemetry-demo/adservice" > label:<name:"net_host_name" value:"adservice" > label:<name:"net_transport" value:"ip_tcp" > label:<name:"os_description" value:"Linux 5.19.0-46-generic" > label:<name:"os_type" value:"linux" > label:<name:"process_command_line" value:"/opt/java/openjdk/bin/java -javaagent:/usr/src/app/opentelemetry-javaagent.jar" > label:<name:"process_executable_path" value:"/opt/java/openjdk/bin/java" > label:<name:"process_pid" value:"1" > label:<name:"process_runtime_description" value:"Eclipse Adoptium OpenJDK 64-Bit Server VM 17.0.7+7" > label:<name:"process_runtime_name" value:"OpenJDK Runtime Environment" > label:<name:"process_runtime_version" value:"17.0.7+7" > label:<name:"rpc_grpc_status_code" value:"0" > label:<name:"rpc_method" value:"GetAds" > label:<name:"rpc_service" value:"oteldemo.AdService" > label:<name:"rpc_system" value:"grpc" > label:<name:"service_name" value:"adservice" > label:<name:"service_namespace" value:"opentelemetry-demo" > label:<name:"telemetry_auto_version" value:"1.24.0" > label:<name:"telemetry_sdk_language" value:"java" > label:<name:"telemetry_sdk_name" value:"opentelemetry" > label:<name:"telemetry_sdk_version" value:"1.24.0" > histogram:<sample_count:15 sample_sum:242.51129199999994 bucket:<cumulative_count:0 upper_bound:0 > bucket:<cumulative_count:9 upper_bound:5 exemplar:<label:<name:"trace_id" value:"e1cab24c74f3d1bd4f841db5272bca38" > label:<name:"span_id" value:"60a3d6b226dba277" > value:1.020379 timestamp:<seconds:1688991914 nanos:233037634 > > > bucket:<cumulative_count:12 upper_bound:10 exemplar:<label:<name:"trace_id" value:"c51f99cd848144be300604e39d7e3653" > label:<name:"span_id" value:"5ccfb540bad5c009" > value:7.77985 timestamp:<seconds:1688991939 nanos:699639495 > > > bucket:<cumulative_count:13 upper_bound:25 > bucket:<cumulative_count:13 upper_bound:50 > bucket:<cumulative_count:14 upper_bound:75 > bucket:<cumulative_count:14 upper_bound:100 > bucket:<cumulative_count:15 upper_bound:250 > bucket:<cumulative_count:15 upper_bound:500 > bucket:<cumulative_count:15 upper_bound:750 > bucket:<cumulative_count:15 upper_bound:1000 > bucket:<cumulative_count:15 upper_bound:2500 > bucket:<cumulative_count:15 upper_bound:5000 > bucket:<cumulative_count:15 upper_bound:7500 > bucket:<cumulative_count:15 upper_bound:10000 > >  has help "The duration of an inbound RPC invocation" but should have ""
otel-col                 |  {"kind": "exporter", "data_type": "metrics", "name": "prometheus"}
otel-col                 | github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter.(*promLogger).Println
otel-col                 |  github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter@v0.76.3/log.go:34
otel-col                 | github.com/prometheus/client_golang/prometheus/promhttp.HandlerForTransactional.func1
otel-col                 |  github.com/prometheus/client_golang@v1.15.0/prometheus/promhttp/http.go:139
otel-col                 | net/http.HandlerFunc.ServeHTTP
otel-col                 |  net/http/server.go:2122
otel-col                 | net/http.(*ServeMux).ServeHTTP
otel-col                 |  net/http/server.go:2500
otel-col                 | go.opentelemetry.io/collector/config/confighttp.(*decompressor).wrap.func1
otel-col                 |  go.opentelemetry.io/collector@v0.76.1/config/confighttp/compression.go:162
otel-col                 | net/http.HandlerFunc.ServeHTTP
otel-col                 |  net/http/server.go:2122
otel-col                 | go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.(*Handler).ServeHTTP
otel-col                 |  go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp@v0.40.0/handler.go:213
otel-col                 | go.opentelemetry.io/collector/config/confighttp.(*clientInfoHandler).ServeHTTP
otel-col                 |  go.opentelemetry.io/collector@v0.76.1/config/confighttp/clientinfohandler.go:39
otel-col                 | net/http.serverHandler.ServeHTTP
otel-col                 |  net/http/server.go:2936
otel-col                 | net/http.(*conn).serve
otel-col                 |  net/http/server.go:1995

Reproduce

docker system prune -a
docker compose build
docker compose up

We will close this issue if:

Additional Context

Add any other context about the problem here.

austinlparker commented 1 year ago

If you look at the metrics endpoint on the collector then search for the name of the metric then you should see one with the description and the one without. Might just need an update for Java agent across things...

julianocosta89 commented 1 year ago

@austinlparker sorry about my ignorance 😅 , but how do I look at the metrics endpoint on the collector?

marromang commented 1 year ago

Hi, I'm using v1.4 and also came across this. Is there anything to solve this?

julianocosta89 commented 1 year ago

@trask any ideas how to proceed here? The error is happening because the Java-Instrumentation is setting a description on the metric: https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/4eddec07fca5701e82ffbf15e70b3141181a65da/instrumentation-api-semconv/src/main/java/io/opentelemetry/instrumentation/api/instrumenter/rpc/RpcServerMetrics.java#L42

The other services that produce the same metric do not set that description: productcatalog and checkoutservice both implemented in Go.

I also saw that we have this issue opened in the collector: https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/20885

But I don't think the issue is there.

trask commented 1 year ago

all language instrumentation should be emitting the description that's listed in the RPC metrics semantic conventions, and then all the descriptions will match and the collector should no longer emit this warning.

unfortunately it looks like none of the instrumentations are emitting the correct description (Java is emitting an old description).

I just sent https://github.com/open-telemetry/semantic-conventions/pull/275 to suggest changes to the RPC metric descriptions. once this is resolved, I think all languages will need to update the description they emit to make this collector warning go away

ghevge commented 1 year ago

I'm seeing another variation of what it seems to be the same issue: https://github.com/open-telemetry/opentelemetry-collector/issues/8340 . Any ETA on the fix? Thanks!

puckpuck commented 8 months ago

The initial issue is fixed, and we are further tracking all SDK metrics issues in #1147

Closing this one