open-telemetry / opentelemetry-java-instrumentation

OpenTelemetry auto-instrumentation and instrumentation libraries for Java
https://opentelemetry.io
Apache License 2.0
1.95k stars 856 forks source link

Service instance are coming random since 2.3.0 #11135

Closed fede843 closed 6 months ago

fede843 commented 6 months ago

Describe the bug

When updating to 2.3.0 we stopped receiving the instance label. This has affected metrics, traces and profiling. The whole stack that passes thru the otel agent. We are gathering all signals per server using grafana agent.

Steps to reproduce

have this basic river config for metrics:

prometheus.remote_write "default" {
    external_labels = {
        datacenter = "colo",
        env        = "beta",
        instance   = "bahia",
    }

    endpoint {
        name = "central"
        url  = "http://prometheus:9090/api/v1/write"
        send_exemplars = true
        basic_auth {
            username = "user"
            password = "password"
        }

        queue_config {
            capacity             = 2500
            max_shards           = 200
            max_samples_per_send = 500
            min_backoff          = "500ms"
            max_backoff          = "5m"
        }

        metadata_config { }
    }

        wal {
                truncate_frequency    = "2h"
                max_keepalive_time    = "20s"
                min_keepalive_time    = "10s"
        }
}

Then a Java app with the jar in docker, simple configuration

Expected behavior

When using 2.2.0 or bellow I get this kind of metrics with the proper tags:

jvm_cpu_time_seconds_total{datacenter="colo", env="beta", instance="bahia", job="eureka12"}
...

The label injection works just as expected.

Actual behavior

When moving to 2.3.0 I get

jvm_cpu_time_seconds_total{datacenter="colo", env="beta", instance="c9f99db4-5a15-4213-a765-9987ff446f30", job="eureka12"}
...

The instance label is coming with an unwanted value.

For traces it is analogue.

Javaagent or library instrumentation version

2.3.0

Environment

Using docker 25.0.3-ce. Grafana Agent: v0.40.4 Prometheus: v2.51.2 Java apps: one app Java 8 the other 17.

Additional context

Most likely related to https://github.com/open-telemetry/opentelemetry-java-instrumentation/pull/11071

laurit commented 6 months ago

@zeitlinger could you take a look

zeitlinger commented 6 months ago

instance is mapped from service.instance.id - which is indeed populated with a random value.

You can change that value by setting the service.instance.id resource attribute. However, doing so is not recommended. The instance should be different if you have 2 different instances of you app running.

github-actions[bot] commented 6 months ago

This has been automatically marked as stale because it has been marked as needing author feedback and has not had any activity for 7 days. It will be closed automatically if there is no response from the author within 7 additional days from this comment.

fede843 commented 6 months ago

Yes, we are misusing the attribute. Will change it to something else. Thanks!