open-telemetry / opentelemetry-java

OpenTelemetry Java SDK
https://opentelemetry.io
Apache License 2.0
1.93k stars 794 forks source link

fail to load OTEL_EXPORTER_OTLP_TRACES_ENDPOINT on Java 8 #5517

Closed hu6360567 closed 1 year ago

hu6360567 commented 1 year ago

Describe the bug set OTEL_EXPORTER_OTLP_TRACES_ENDPOINT envrionment varaiable without http causes java agent fail to start on java8

Steps to reproduce Use java8 to launch with agent as below

OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=127.0.0.1:4317 OTEL_METRICS_EXPORTER=none
/Library/Java/JavaVirtualMachines/openjdk-8.jdk/Contents/Home/bin/java -javaagent:/path/to/opentelemetry-javaagent.jar -Dotel.resource.attributes=service.name=xxxx -Dotel.traces.exporter=otlp -Dotel.javaagent.debug=true org.example.Main

The output as below

OpenTelemetry Javaagent failed to start
io.opentelemetry.sdk.autoconfigure.spi.ConfigurationException: OTLP endpoint must be a valid URL: 127.0.0.1:4317
    at io.opentelemetry.exporter.internal.otlp.OtlpConfigUtil.validateEndpoint(OtlpConfigUtil.java:210)
    at io.opentelemetry.exporter.internal.otlp.OtlpConfigUtil.configureOtlpExporterBuilder(OtlpConfigUtil.java:66)
    at io.opentelemetry.exporter.otlp.internal.OtlpSpanExporterProvider.createExporter(OtlpSpanExporterProvider.java:52)
    at io.opentelemetry.sdk.autoconfigure.SpiUtil.lambda$loadConfigurable$0(SpiUtil.java:48)
    at io.opentelemetry.sdk.autoconfigure.NamedSpiManager.tryLoadImplementationForName(NamedSpiManager.java:46)
    at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)
    at io.opentelemetry.sdk.autoconfigure.NamedSpiManager.getByName(NamedSpiManager.java:36)
    at io.opentelemetry.sdk.autoconfigure.SpanExporterConfiguration.configureExporter(SpanExporterConfiguration.java:91)
    at io.opentelemetry.sdk.autoconfigure.SpanExporterConfiguration.configureSpanExporters(SpanExporterConfiguration.java:66)
    at io.opentelemetry.sdk.autoconfigure.TracerProviderConfiguration.configureTracerProvider(TracerProviderConfiguration.java:51)
    at io.opentelemetry.sdk.autoconfigure.AutoConfiguredOpenTelemetrySdkBuilder.build(AutoConfiguredOpenTelemetrySdkBuilder.java:356)
    at io.opentelemetry.javaagent.tooling.OpenTelemetryInstaller.installOpenTelemetrySdk(OpenTelemetryInstaller.java:29)
    at io.opentelemetry.javaagent.tooling.AgentInstaller.installBytebuddyAgent(AgentInstaller.java:118)
    at io.opentelemetry.javaagent.tooling.AgentInstaller.installBytebuddyAgent(AgentInstaller.java:98)
    at io.opentelemetry.javaagent.tooling.AgentStarterImpl.start(AgentStarterImpl.java:98)
    at io.opentelemetry.javaagent.bootstrap.AgentInitializer$1.run(AgentInitializer.java:53)
    at io.opentelemetry.javaagent.bootstrap.AgentInitializer$1.run(AgentInitializer.java:47)
    at io.opentelemetry.javaagent.bootstrap.AgentInitializer.execute(AgentInitializer.java:64)
    at io.opentelemetry.javaagent.bootstrap.AgentInitializer.initialize(AgentInitializer.java:46)
    at io.opentelemetry.javaagent.OpenTelemetryAgent.startAgent(OpenTelemetryAgent.java:57)
    at io.opentelemetry.javaagent.OpenTelemetryAgent.premain(OpenTelemetryAgent.java:45)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at sun.instrument.InstrumentationImpl.loadClassAndStartAgent(InstrumentationImpl.java:386)
    at sun.instrument.InstrumentationImpl.loadClassAndCallPremain(InstrumentationImpl.java:401)
Caused by: java.net.MalformedURLException: no protocol: 101.37.151.41:4317
    at java.net.URL.<init>(URL.java:611)
    at java.net.URL.<init>(URL.java:508)
    at java.net.URL.<init>(URL.java:457)
    at io.opentelemetry.exporter.internal.otlp.OtlpConfigUtil.validateEndpoint(OtlpConfigUtil.java:208)
    ... 26 more

What did you expect to see? The "OTEL_EXPORTER_OTLP_TRACES_ENDPOINT" environment variable needs to be set with "http://127.0.0.1:4317" to work as expect on Java 8. Following https://opentelemetry.io/docs/concepts/sdk-configuration/otlp-exporter-configuration/#otel_exporter_otlp_traces_endpoint, it should not start with http if the protocol is grpc, which works fine in Java 17.

What did you see instead? A clear and concise description of what you saw instead.

What version and what artifacts are you using? Artifacts: (e.g., opentelemetry-api, opentelemetry-sdk, which exporters, etc) Version: (e.g., v0.4.0, 1eb551b, etc) How did you reference these artifacts? (excerpt from your build.gradle, pom.xml, etc)

Environment Compiler: OpenJDK 1.8.0_372 OS: MacOS 12.6.6

Additional context Add any other context about the problem here.

jack-berg commented 1 year ago

The following test passes in both java 8 and java 17 environments, indicating that in both cases, the scheme is required:

    System.out.println(System.getProperty("java.version"));
    assertThatThrownBy(() -> OtlpGrpcSpanExporter.builder()
        .setEndpoint("127.0.0.1:4317"))
        .isInstanceOf(IllegalArgumentException.class);

This is aligned with the specification, which for gRPC endpoints says:

Target to which the exporter is going to send spans or metrics. The endpoint SHOULD accept any form allowed by the underlying gRPC client implementation. Additionally, the endpoint MUST accept a URL with a scheme of either http or https. A scheme of https indicates a secure connection and takes precedence over the insecure configuration setting. A scheme of http indicates an insecure connection and takes precedence over the insecure configuration setting. If the gRPC client implementation does not support an endpoint with a scheme of http or https then the endpoint SHOULD be transformed to the most sensible format for that implementation. Default: http://localhost:4317 [1]

We have our own gRPC client implementation which requires scheme to be included, so we're aligned with the spec.

hu6360567 commented 1 year ago

The following test passes in both java 8 and java 17 environments, indicating that in both cases, the scheme is required:

    System.out.println(System.getProperty("java.version"));
    assertThatThrownBy(() -> OtlpGrpcSpanExporter.builder()
        .setEndpoint("127.0.0.1:4317"))
        .isInstanceOf(IllegalArgumentException.class);

This is aligned with the specification, which for gRPC endpoints says:

Target to which the exporter is going to send spans or metrics. The endpoint SHOULD accept any form allowed by the underlying gRPC client implementation. Additionally, the endpoint MUST accept a URL with a scheme of either http or https. A scheme of https indicates a secure connection and takes precedence over the insecure configuration setting. A scheme of http indicates an insecure connection and takes precedence over the insecure configuration setting. If the gRPC client implementation does not support an endpoint with a scheme of http or https then the endpoint SHOULD be transformed to the most sensible format for that implementation. Default: http://localhost:4317 [1]

We have our own gRPC client implementation which requires scheme to be included, so we're aligned with the spec.

According to the specification, if use the environment variable to set a grpc protocol endpoint, scheme should not be included.

  • gRPC: export OTEL_EXPORTER_OTLP_LOGS_ENDPOINT="my-api-endpoint:443"
    • HTTP: export OTEL_EXPORTER_OTLP_LOGS_ENDPOINT="http://my-api-endpoint/v1/metrics"

What makes it a trouble is, setting this environment variable with scheme whould fail in python auto-instrument, which makes mess configuration in our system with python/java applications.

jack-berg commented 1 year ago

According to the specification, if use the environment variable to set a grpc protocol endpoint, scheme should not be included.

That's not correct. SDKs are required to accept endpoints with a scheme included (and infer whether TLS is enabled / disabled based on the scheme), but may also accept endpoints without a scheme. See this PR for more context: https://github.com/open-telemetry/opentelemetry-specification/pull/1729

So java could change its behavior to accept endpoints with no scheme. But doing so would open questions about whether TLS is enabled / disabled. We could support the OTEL_EXPORTER_OTLP_INSECURE variable, but what if OTEL_EXPORTER_OTLP_INSECURE is included and the endpoint includes a conflicting scheme?

If python fails when scheme is included in the endpoint, then its not spec compliant and should be fixed:

the endpoint MUST accept a URL with a scheme of either http or https.

hu6360567 commented 1 year ago

For SDK/API level, scheme should be included for sure, as the specificiation defines. But for auto-instument configuration, as you point out the OTEL_EXPORTER_OTLP_INSECURE is used to indicate whether TLS should be enabled. So the workflow should be: If OTEL_EXPORTER_OTLP_TRACES_ENDPOINT includes the scheme, OTEL_EXPORTER_OTLP_INSECURE should be ignored with a warning, otherwise, full url with scheme should be filled according to the OTEL_EXPORTER_OTLP_INSECURE value. If the endpoint vairable always comes with the scheme, then what's the use of OTEL_EXPORTER_OTLP_INSECURE for?

Anyway, I'm fine with endpoint format, as long as the value can be accepted consistently accross the ecosystem. As https://github.com/open-telemetry/opentelemetry.io/pull/2863 has been made, I'll keep on the progress.

hu6360567 commented 1 year ago

Seems the python auto-instrument has conflict with our application. It can accept the endpoint with scheme. I'll inspect the problem later.