open-telemetry / opentelemetry-java-instrumentation

OpenTelemetry auto-instrumentation and instrumentation libraries for Java
https://opentelemetry.io
Apache License 2.0
1.99k stars 868 forks source link

OpenTelemetry Java agent + Micrometer Tracing #7576

Open janchristian-haddorp opened 1 year ago

janchristian-haddorp commented 1 year ago

Demo application is using OpenTelemetry Java agent and Micrometer for traces customization. In some cases these custom tags/attributes are getting lost. Not sure if this is a valid setup or if there is another recommended approach?

-javaagent:lib/opentelemetry-javaagent.jar

Micrometer dependencies are added via Gradle.

dependencies {
   implementation 'org.springframework.boot:spring-boot-starter-actuator'
   implementation 'org.springframework.boot:spring-boot-starter-webflux'
   implementation 'io.micrometer:micrometer-tracing-bridge-otel'
   runtimeOnly 'io.micrometer:micrometer-registry-prometheus'
   runtimeOnly 'io.projectreactor:reactor-core-micrometer'
   compileOnly 'org.projectlombok:lombok'
   annotationProcessor 'org.projectlombok:lombok'
   testImplementation 'org.springframework.boot:spring-boot-starter-test'
   testImplementation 'io.projectreactor:reactor-test'
}

A controller method does a WebFlux request to another endpoint – to simulate an external service call. Both controller methods log the current span and attach the custom attribute the the span.

@RestController
@RequiredArgsConstructor
@Slf4j
public class WebFluxDemoController {

    private final WebClient.Builder builder;

    private final Tracer tracer;

    @GetMapping("/")
    public Mono<String> helloWorld() {
        return builder.baseUrl("http://localhost:8080").build()
                .get()
                .uri("/internal")
                .retrieve().bodyToMono(String.class)
                .handle((str, sink) -> {
                    tracer.currentSpan().tag("hello.world", str);
                    log.info("HelloWorld: Trace ID: {}, Span ID: {}",  tracer.currentSpan().context().traceId(),  tracer.currentSpan().context().spanId());
                    sink.next(str);
                });
    }

    @GetMapping("/internal")
    public Mono<String> helloWorldInternal() {
        return Mono
                .just("Hello World!")
                .handle((str, sink) -> {
                    tracer.currentSpan().tag("hello.world.internal", str);
                    log.info("HelloWorldInternal: Trace ID: {}, Span ID: {}",  tracer.currentSpan().context().traceId(),  tracer.currentSpan().context().spanId());
                    sink.next(str);
                });
    }
}

For the first controller method this is working perfectly. Span ID is matching the overall trace and custom attribute is set.

image

On the second call the added custom attribute is not displayed. The logged span ID seems to be connected to the caller method instead to the actual internal controller method 'helloWorldInternal'. Same behavior occurs if using two separate Spring Boot applications.

image

Doing the same with Spring MVC instead of WebFlux provides the expected result. In both cases Spring Boot 3.0.1 and OpenTelemetry Java agent 1.22.0 is being used.

2023-01-13T23:50:41.959+01:00  INFO 61977 --- [           main] c.e.opentelemetry.WebFluxApplication     : Starting WebFluxApplication using Java 17.0.5 with PID 61977 (/Users/jch439/Development/git/playground/spring-opentelemetry/opentelemetry-webflux/build/classes/java/main started by jch439 in /Users/jch439/Development/git/playground/spring-opentelemetry)
2023-01-13T23:50:41.965+01:00  INFO 61977 --- [           main] c.e.opentelemetry.WebFluxApplication     : The following 1 profile is active: "main"
2023-01-13T23:50:42.717+01:00 ERROR 61977 --- [           main] i.n.r.d.DnsServerAddressStreamProviders  : Unable to load io.netty.resolver.dns.macos.MacOSDnsServerAddressStreamProvider, fallback to system defaults. This may result in incorrect DNS resolutions on MacOS. Check whether you have a dependency on 'io.netty:netty-resolver-dns-native-macos'. Use DEBUG level to see the full stack: java.lang.UnsatisfiedLinkError: failed to load the required native library
2023-01-13T23:50:42.960+01:00  INFO 61977 --- [           main] o.s.b.a.e.web.EndpointLinksResolver      : Exposing 1 endpoint(s) beneath base path '/actuator'
2023-01-13T23:50:43.229+01:00  INFO 61977 --- [           main] o.s.b.web.embedded.netty.NettyWebServer  : Netty started on port 8080
2023-01-13T23:50:43.239+01:00  INFO 61977 --- [           main] c.e.opentelemetry.WebFluxApplication     : Started WebFluxApplication in 1.425 seconds (process running for 2.459)
2023-01-13T23:50:55.404+01:00  INFO 61977 --- [ctor-http-nio-4] c.e.opentelemetry.WebFluxDemoController  : HelloWorldInternal: Trace ID: bc1c11fc981527732d70799403f6b843, Span ID: 8c3df081fc23f34c
2023-01-13T23:50:55.439+01:00  INFO 61977 --- [ctor-http-nio-3] c.e.opentelemetry.WebFluxDemoController  : HelloWorld: Trace ID: bc1c11fc981527732d70799403f6b843, Span ID: c8d872af11b76078
trask commented 1 year ago

hi @janchristian-haddorp! this repository doesn't currently have any tests or support for Micrometer Tracing. In general we would recommend to use OpenTelemetry tracing API directly (instead of indirectly via the Micrometer Tracing bridge) if you have that option.

janchristian-haddorp commented 1 year ago

Ok, hoped usage of Micrometer Tracing would be possible as this would give addional flexibility. Especially as mentiong on some tutorials, e.g. Observability with Spring Boot 3.

However, switching to OpenTelemetry API really helps - at least at first sight. When only using org.springframework.boot:spring-boot-starter-webflux tracing result is how expected. But if also adding org.springframework.boot:spring-boot-starter-web (or indirectly e.g. by using org.springframework.boot:spring-boot-starter-data-rest) some custom attriubutes are getting lost.

Updated controller - now using OpenTelemetry API

@RestController
@RequiredArgsConstructor
@Slf4j
public class WebFluxDemoController {

    private final WebClient.Builder builder;

    @GetMapping("/")
    public Mono<String> helloWorld() {
        return builder.baseUrl("http://localhost:8080").build()
                .get()
                .uri("/internal")
                .retrieve().bodyToMono(String.class)
                .handle((str, sink) -> {
                    var span = Span.current();
                    span.setAttribute("hello.world", str);
                    log.info("HelloWorld: Trace ID: {}, Span ID: {}",  span.getSpanContext().getTraceId(),  span.getSpanContext().getSpanId());
                    sink.next(str);
                });
    }

    @GetMapping("/internal")
    public Mono<String> helloWorldInternal() {
        return Mono
                .just("Hello World!")
                .handle((str, sink) -> {
                    var span = Span.current();
                    span.setAttribute("hello.world.internal", str);
                    log.info("HelloWorldInternal: Trace ID: {}, Span ID: {}",  span.getSpanContext().getTraceId(),  span.getSpanContext().getSpanId());
                    sink.next(str);
                });
    }
}

The logs clearly show the two affected spans.

2023-01-15T23:20:40.272+01:00  INFO 21014 --- [           main] c.e.opentelemetry.WebFluxApplication     : Starting WebFluxApplication using Java 17.0.5 with PID 21014 (/Users/jch439/Development/git/playground/spring-opentelemetry/opentelemetry-webflux/build/classes/java/main started by jch439 in /Users/jch439/Development/git/playground/spring-opentelemetry)
2023-01-15T23:20:40.278+01:00  INFO 21014 --- [           main] c.e.opentelemetry.WebFluxApplication     : No active profile set, falling back to 1 default profile: "default"
2023-01-15T23:20:40.925+01:00  INFO 21014 --- [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat initialized with port(s): 8080 (http)
2023-01-15T23:20:40.938+01:00  INFO 21014 --- [           main] o.apache.catalina.core.StandardService   : Starting service [Tomcat]
2023-01-15T23:20:40.938+01:00  INFO 21014 --- [           main] o.apache.catalina.core.StandardEngine    : Starting Servlet engine: [Apache Tomcat/10.1.4]
2023-01-15T23:20:40.978+01:00  INFO 21014 --- [           main] o.a.c.c.C.[Tomcat].[localhost].[/]       : Initializing Spring embedded WebApplicationContext
2023-01-15T23:20:40.978+01:00  INFO 21014 --- [           main] w.s.c.ServletWebServerApplicationContext : Root WebApplicationContext: initialization completed in 668 ms
2023-01-15T23:20:41.341+01:00 ERROR 21014 --- [           main] i.n.r.d.DnsServerAddressStreamProviders  : Unable to load io.netty.resolver.dns.macos.MacOSDnsServerAddressStreamProvider, fallback to system defaults. This may result in incorrect DNS resolutions on MacOS. Check whether you have a dependency on 'io.netty:netty-resolver-dns-native-macos'. Use DEBUG level to see the full stack: java.lang.UnsatisfiedLinkError: failed to load the required native library
2023-01-15T23:20:41.568+01:00  INFO 21014 --- [           main] o.s.b.a.e.web.EndpointLinksResolver      : Exposing 1 endpoint(s) beneath base path '/actuator'
2023-01-15T23:20:41.621+01:00  INFO 21014 --- [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat started on port(s): 8080 (http) with context path ''
2023-01-15T23:20:41.629+01:00  INFO 21014 --- [           main] c.e.opentelemetry.WebFluxApplication     : Started WebFluxApplication in 1.507 seconds (process running for 2.54)
2023-01-15T23:20:55.803+01:00  INFO 21014 --- [nio-8080-exec-1] o.a.c.c.C.[Tomcat].[localhost].[/]       : Initializing Spring DispatcherServlet 'dispatcherServlet'
2023-01-15T23:20:55.803+01:00  INFO 21014 --- [nio-8080-exec-1] o.s.web.servlet.DispatcherServlet        : Initializing Servlet 'dispatcherServlet'
2023-01-15T23:20:55.804+01:00  INFO 21014 --- [nio-8080-exec-1] o.s.web.servlet.DispatcherServlet        : Completed initialization in 1 ms
2023-01-15T23:20:56.218+01:00  INFO 21014 --- [nio-8080-exec-2] c.e.opentelemetry.WebFluxDemoController  : HelloWorldInternal: Trace ID: 91d1e352d9d89919581594b25d5ffa20, Span ID: 6b6729178c5eadef
2023-01-15T23:20:56.271+01:00  INFO 21014 --- [ctor-http-nio-2] c.e.opentelemetry.WebFluxDemoController  : HelloWorld: Trace ID: 91d1e352d9d89919581594b25d5ffa20, Span ID: 26f188226664c44d

However, the attribute set on the first method 'helloWorld' is missing.

image

Where the custom attribute for the second span is clearly shown (method 'helloWorldInternal').

image

Any idea how/why the Spring MVC dependency (org.springframework.boot:spring-boot-starter-web) is effecting the reactive WebClient tracing?

trask commented 1 year ago

@janchristian-haddorp can you post a github repo with an example that we can run and look into what's happening?

janchristian-haddorp commented 1 year ago

@trask, demo repo has be created... https://github.com/janchristian-haddorp/opentelemetry/tree/issue-7576

Hope this helps reproducing my case. If anything is missing I will add asap.

arana3 commented 1 year ago

Is this issue not related to #7436?

when that issue is resolved, there will be end to end baggage + trace context propagation on WebFlux server-side (not just WebClient itself).

Also mixing both reactive and non-reactive web dependencies in same app is not ideal in my opinion. You'd want to make sure only one server engine is running for a given app.

janchristian-haddorp commented 1 year ago

@arana3, the described case can be still reproduced - using latest Spring Boot 3.0.4 release.

I agree, mixing reactive and non-reactive web dependencies is not good praxis. On the other hand there is no warning that this might lead to conflicts. Also a challenge as long as not all libs/tools are fully reactive.