micrometer-metrics / context-propagation

Context Propagation API
Apache License 2.0
83 stars 23 forks source link

traceId and spanId missing in access logs #212

Closed oemergenc closed 7 months ago

oemergenc commented 7 months ago

Hi,

I am not sure wether this is a problem in the context-propagation or somewhere else, so I am trying my luck here. I am trying to set up a very simple spring cloud gateway application and I want to log traceIds and spanId. I was able to create a simple example which seems to work, although I am not sure wether I have configured context propagation the right way. Anyway, there is one test failing due to the fact that the traceId and spanId are missing from one request and I cannot understand why. Please have a look at this repo:

https://github.com/oemergenc/spring-cloud-gateway-upgrade

I have added a single integration test which is success for one request and fails for another and I have no idea why. The test is here: https://github.com/oemergenc/spring-cloud-gateway-upgrade/blob/a79b683f7d6dd4194cdf21029c256471b550f915/src/test/java/com/gateway/ItTest.java#L13

Can anyone help me with that?

It would also be nice, if someone can check wether my configuration here: https://github.com/oemergenc/spring-cloud-gateway-upgrade/blob/e3045ee2137d56b0fe0f7751409e48f39bce7f74/src/main/java/com/gateway/configuration/TracingConfiguration.java#L27 is correct to have working traceId and spanId propagation?

Thanks in advance.

oemergenc commented 7 months ago

Hi, any help on this would be really great. May I ask this question somewhere else?

jonatan-ivanov commented 7 months ago

I'm not an expert of Spring Cloud Gateway or Reactor-Netty, but I think this could be an issue with the instrumentation of one of them. The tracing information might be missing from those logs since the code that does the logging is not "in-scope" (does not have access to the tracing information/context).

Could you please try to improve on the reproducer a bit?

/cc @violetagg Should we move this issue into Reactor-Netty instead?

violetagg commented 7 months ago

Let me first take a look.

No Reactor Netty is not gonna provide this.

More info https://github.com/reactor/reactor-netty/issues/3003

violetagg commented 7 months ago

@oemergenc You need to enable the tracing for Reactor Netty. The code below is just a sample. Please replace the Function.identity() with an appropriate function (check the javadoc to understand why it is important to provide this function https://projectreactor.io/docs/netty/release/api/reactor/netty/http/server/HttpServer.html#metrics-boolean-java.util.function.Function-):

  @Bean
  public NettyServerCustomizer defaultNettyServerCustomizer() {
    return server ->
        server
            .accessLog(true)
            .metrics(true, Function.identity())
            .doOnConnection(
                connection -> {
                  connection.addHandlerLast(tracingChannelDuplexHandler());
                });
  }
oemergenc commented 7 months ago

Hi @violetagg @jonatan-ivanov ,

thanks for your participation. I cleaned the repo a little bit up and tried to be as minimal as possible:I

All of this fixed the failing test. Thank you for your help.

Unfortunately I still need to say that I find the needed configuration very hard to understand. All of this is needed just to have traceIds and spanIds working. It feels like something will break with the next minor or patch update of any involved library.

Is this really the simplest way and most minimal way of getting tracing to work in reactor netty stack? Why do I need to create the TracingChannelDuplexHandler with a ContextSnapshotFactory on my own?

violetagg commented 7 months ago

Why do I need to create the TracingChannelDuplexHandler with a ContextSnapshotFactory on my own?

You need to manage this by yourself because it is too expensive to enable tracing for every possible event that Netty may send. You can decide which events are most important for your use case and enable the tracing only for those.

jonatan-ivanov commented 7 months ago

Since it seems it's working now and this issue has not much to do with the context-propagation project, let me close this issue. @oemergenc If you feel that docs could be improved around this, please feel free to open an issue (or better: a PR) for Reactor Netty. Maybe another one for Spring Boot to auto-configure this and add a property to enable/disable access logs and instrumentation, that way you might be able to get away by flipping properties.