Micrometer Tracing Too Granular

jimbogithub commented 1 year ago

For a simple HTTP Server call trace we are seeing multiple Spans with two of them having a dozen or so attached Events. This feels like way too much detail for "application level" tracing. It is almost exclusively "framework" level tracing which as the application developer I have little interest in or control over. I'm relying on you guys to keep up the good work on that front.

We see Spans for:

the expected http get/post url...
security filterchain before with a dozen attached ...Filter before Events
secured request
security filterchain after with a dozen attached ...Filter after Events

I'm guessing there are probably ways to filter them out but I think the default behaviour should be less granular.

[Spring-Boot 3.0.2]

wilkinsona commented 1 year ago

This feels quite subjective to me and an area where it won't be possible to please all of the people all of the time. I've asked the observability team to take a look and share their opinions.

jonatan-ivanov commented 1 year ago

@jimbogithub Could you please give us a screenshot about your tracing UI showing us the parts you found too granular? Also, what is the underlying issue (is there any)? Is it messing up your UI, are more important details lost because of them? Or you are not interested in them?

jimbogithub commented 1 year ago

A few screenshots...

Jaeger for a simple HTTP request that does one JDBC call. There are 6 Spans when 2 would be sufficient and there are 14 Events each attached to 2 of the Spans:

DataDog for a simple HTTP request (prometheus metric scrape). 5 Spans when I think 1 is sufficient. (DataDog hasn't caught up yet with displaying the Events properly):

Issues:

Yes it does over complicate the FlameGraph displays. The above examples are the most trivial flows but have at least 5 Spans. In real world cases a Trace for a business level activity will have a series of REST, REST client, JDBC, JMS, Kafka etc Spans distributed across multiple services and these additional "framework" Spans are noise.
Where's my Tag? If I add a Tag to the current Span during request processing it ends up on the "secured request" Span when all the other interesting Tags (URI, Status etc) are on the initial Span. I have to hunt in multiple places to find the info I want. Search tools will also not be able to find Spans of interest if the Tags we are searching on are on different Spans.
Costs. There will be CPU, network and storage costs both in terms of performance and dollars (e.g. DataDog Span retention) for these Spans and Events. Some I can mitigate by discarding Spans but should they really be there in the first place?

Tracing is awesome. The ability to see the distributed application flow and identify latency, bottlenecks, call frequencies, changes in behaviour over time etc is great. But I don't need to know that the Spring HeaderWriterFilter took 9.6 microseconds. I get that the Spring developers might want to know that so maybe you have some config to turn it on and off but I doubt it's what most app developers are looking for and should be off by default.

jonatan-ivanov commented 1 year ago

TL;DR: All of these can be valid for other users but you can turn everything off, see the last paragraph.

~There are X Spans when Y would be sufficient

I think this really depends on your use-case. Different users have different needs but I agree that the http span is likely more important than the security spans for most of the users. Maybe we should consider adding a flag that is opt-out by default and you need to opt-in if you want this data.

there are 14 Events each attached to 2 of the Spans

This is exactly why I asked for a screenshot: I thought something is messed up and they are polluting other spans. Please notice that you don't need to look at these events on your UI, they should be hidden from you under the security spans. This can also be solved by the flag I mentioned before.

DataDog for a simple HTTP request (prometheus metric scrape). 5 Spans when I think 1 is sufficient.

Again, you might think 1 is sufficient but one can argue that 0 should be sufficient since they don't want to observe that Prometheus scrapes their app. Also some can argue that 1 is not sufficient because this way it is visible if the Prometheus endpoint is secured (and how). Also, since these security components can call out to other components over the network (LDAP, DB, Auth server) or just slow by design, this information actually can be useful.

Yes it does over complicate the FlameGraph displays...

By FlameGraph you mean the trace vies with the spans and timeline or do you have an actual FlameGraph?

Where's my Tag? If I add a Tag to the current Span...

You should not add tags to the current Span you should add them to the current Observation but in both cases you can get the parent and add the tags to it. Also, there are components that you can inject and provide the data you want, you don't necessarily need to manually get the Observation/Span and tag it.

But I don't need to know that the Spring HeaderWriterFilter took 9.6 microseconds.

I get that but please understand that there are users who might. Although I agree that being this on by default is arguable but to me the value of it in certain scenarios is not a question.

Costs. There will be CPU, network and storage costs both in terms of performance and dollars

Please look at the ObservationPredicate component. With the help of it, you can disable the observations you want, like security (but also actuator, etc). If you do this, the observation will be no-op makink the cost ~zero. Fyi: we are also thinking about introducing some sort of granularity concept to the Observation API (think about log levels) which might be similar to the optin/out flag but with finer control.

jimbogithub commented 1 year ago

@jonatan-ivanov Thanks for the detailed info. I've solved the "granularity" issues with:

    @Bean
    ObservationRegistryCustomizer<ObservationRegistry> observationRegistryCustomizer() {
        return (registry) -> registry.observationConfig()
                .observationPredicate(ObservationPredicates.noSpringSecurity())
                .observationPredicate(ObservationPredicates.noActuator());
    }

import org.springframework.http.server.observation.ServerRequestObservationContext;
import io.micrometer.observation.ObservationPredicate;

public final class ObservationPredicates {

    private ObservationPredicates() {
    }

    public static ObservationPredicate noSpringSecurity() {
        return (name, context) -> !name.startsWith("spring.security.");
    }

    public static ObservationPredicate noActuator() {
        return (name, context) -> {
            if (context instanceof ServerRequestObservationContext srCtx) {
                return !srCtx.getCarrier().getRequestURI().startsWith("/actuator");
            }
            return true;
        };
    }

}

I've also extended the org.springframework.http.server.observation.DefaultServerRequestObservationConvention to inject additional Tags via KeyValues (also related to https://github.com/spring-projects/spring-boot/issues/34132).

wilkinsona commented 1 year ago

@jimbogithub Good to hear you've got things working in a way that suits you. Thanks for letting us know. I'll close this one now as I don't think there's anything that we should change in Spring Boot. If we get more feedback that the defaults are to granular we can reconsider the defaults in the future, but they won't change until a new minor release (at the earliest) along with a way to restore the current behavior.

alexismanin commented 3 months ago

Adding my opinion to the debate : the Spring security "secured request" span causes me trouble when I want to get or amend the root HTTP request span, as any component (web filter, rest controller, etc.) that want to access the root http request span must run before the security filter (which is nearly impossible except for a filter with higher precedence order).

I think it is good to add a 'security filter chain before', to be able to identify/isolate any security related action. But once security checks are done, we should resume on the base HTTP span, not force the rest of the filter chain to run inside a spring security based span ('secured request').

For clarity, here is an example: in the below trace, 1 Is the root span containing HTTP request information, 2 is useful, it encloses exactly the security chain, but 3 is annoying because it "hides" the HTTP span and make it difficult to access it. Also, it does not add much value, I think. If the security filter is successful, we resume HTTP request processing, I am really not sure we require a "secured request" span.

spring-security-span

I wonder if other people have the same opinion, or if the "secured request" span is useful for some use-cases ?

jonatan-ivanov commented 3 months ago

Adding my opinion to the debate : the Spring security "secured request" span causes me trouble when I want to get or amend the root HTTP request span, as any component (web filter, rest controller, etc.) that want to access the root http request span must run before the security filter (which is nearly impossible except for a filter with higher precedence order).

I might not understand this or "nearly impossible" might be an overstatement, this should be possible in multiple ways:

Use an ObservationFilter or an ObservationConventon to modify the HTTP Observation
Use a SpanFilter to modify the HTTP Span
Call observation.getParent() recursively until it returns null to get the root (HTTP) Observation

not force the rest of the filter chain to run inside a spring security based span ('secured request').

As far as I know, the secured request Observation is there so that you can see if there any "rogue" operations that are not falling under the scope of authentication/authorization (not secured). You can easily disable it via an ObservationPredicate.

What is useful/unnecessary varies a lot, what feels annoying to you might be useful to other users and there are users who disable the whole security observability feature entirely that's why there are many options for you to customize the behavior (see above).

spring-projects / spring-boot

Micrometer Tracing Too Granular #34133