jdbc-observations / datasource-micrometer

Micrometer Observation API instrumentation and Spring Boot 3 Auto-Configuration for JDBC APIs
Apache License 2.0
50 stars 8 forks source link

Possible memory leak and failure to restore the observation scope when using a JPA repository from a GraphQL controller #41

Open nikb1tamerfdp opened 6 days ago

nikb1tamerfdp commented 6 days ago

The context

In my application (Spring Boot 3.2.6), I'm using the GraphQL, Web and JPA starters. I have created a GraphQL controller, in which I interact with a JPA repository. I have also enabled observability with Micrometer.

The issue

When I query my controller, everything executes successfully, but I can find this log line (formatted for clarity purpose) :

[...] WARN [...] --- [nio-8080-exec-3] [app=My Application, traceId=7f1a080fea0a3662e74ce0f85d00664b, spanId=3f4b468cb5bccffd] i.m.o.c.ObservationThreadLocalAccessor :
Observation <{
    name=http.server.requests(null),
    error=null,
    context=name='http.server.requests',
    contextualName='null',
    error='null',
    lowCardinalityKeyValues=[
        exception='none',
        method='POST',
        outcome='SUCCESS',
        status='200',
        uri='UNKNOWN'
    ],
    highCardinalityKeyValues=[
        http.url='/graphql'
    ],
    map=[
        class io.micrometer.tracing.handler.TracingObservationHandler$TracingContext='TracingContext{
            span=ImmutableSpanContext{
                traceId=7f1a080fea0a3662e74ce0f85d00664b,
                spanId=6664922c03ad4f85,
                traceFlags=01,
                traceState=ArrayBasedTraceState{entries=[]}, 
                remote=false, 
                valid=true
            }
        }',
        class io.micrometer.core.instrument.LongTaskTimer$Sample='SampleImpl{
            duration(seconds)=0.0147542, 
            duration(nanos)=1.47542E7, 
            startTimeNanos=29155328641800
        }',
        class io.micrometer.core.instrument.Timer$Sample='io.micrometer.core.instrument.Timer$Sample@3b0b7fc2'
    ],
    parentObservation=null
}>
to which we're restoring is not the same as the one set as this scope's parent observation
<{
    name=graphql.datafetcher(null), 
    error=null, 
    context=name='graphql.datafetcher', 
    contextualName='null', 
    error='null', 
    lowCardinalityKeyValues=[
        graphql.error.type='NONE', 
        graphql.field.name='retrieveBookById', 
        graphql.outcome='SUCCESS'
    ], 
    highCardinalityKeyValues=[
        graphql.field.path='/retrieveBookById'
    ],
    map=[
        class io.micrometer.tracing.handler.TracingObservationHandler$TracingContext='TracingContext{
            span=ImmutableSpanContext{
                traceId=7f1a080fea0a3662e74ce0f85d00664b, 
                spanId=a4204ff90062fd5a, 
                traceFlags=01, 
                traceState=ArrayBasedTraceState{entries=[]}, 
                remote=false, 
                valid=true
            }
        }', 
        class io.micrometer.core.instrument.LongTaskTimer$Sample='SampleImpl{
            duration(seconds)=0.0108987, 
            duration(nanos)=1.08987E7, 
            startTimeNanos=29155332569100
        }', 
        class io.micrometer.core.instrument.Timer$Sample='io.micrometer.core.instrument.Timer$Sample@48ad6d4a'
    ], 
    parentObservation={
        name=graphql.request(null), 
        error=null, 
        context=name='graphql.request', 
        contextualName='null', 
        error='null', 
        lowCardinalityKeyValues=[
            graphql.operation='query', 
            graphql.outcome='INTERNAL_ERROR'
        ], 
        highCardinalityKeyValues=[
            graphql.execution.id='3bb1672e-660d-bb96-d7c5-4bd6ceb56cd6'
        ], 
        map=[
            class io.micrometer.tracing.handler.TracingObservationHandler$TracingContext='TracingContext{
                span=ImmutableSpanContext{
                    traceId=7f1a080fea0a3662e74ce0f85d00664b, 
                    spanId=deeb0dd86af7c990, 
                    traceFlags=01, 
                    traceState=ArrayBasedTraceState{entries=[]}, 
                    remote=false, 
                    valid=true
                }
            }', 
            class io.micrometer.core.instrument.LongTaskTimer$Sample='SampleImpl{
                duration(seconds)=0.0138164, 
                duration(nanos)=1.38164E7, 
                startTimeNanos=29155329694600
            }', 
            class io.micrometer.core.instrument.Timer$Sample='io.micrometer.core.instrument.Timer$Sample@3930f5ed'
        ], 
        parentObservation={
            name=http.server.requests(null), 
            error=null, 
            context=name='http.server.requests', 
            contextualName='null', 
            error='null', 
            lowCardinalityKeyValues=[
                exception='none', 
                method='POST', 
                outcome='SUCCESS', 
                status='200', 
                uri='UNKNOWN'
            ], 
            highCardinalityKeyValues=[
                http.url='/graphql'
            ], 
            map=[
                class io.micrometer.tracing.handler.TracingObservationHandler$TracingContext='TracingContext{
                    span=ImmutableSpanContext{
                        traceId=7f1a080fea0a3662e74ce0f85d00664b, 
                        spanId=6664922c03ad4f85, 
                        traceFlags=01, 
                        traceState=ArrayBasedTraceState{entries=[]}, 
                        remote=false, valid=true
                    }
                }',
                class io.micrometer.core.instrument.LongTaskTimer$Sample='SampleImpl{
                    duration(seconds)=0.0149397, 
                    duration(nanos)=1.49397E7, 
                    startTimeNanos=29155328641800
                }', 
                class io.micrometer.core.instrument.Timer$Sample='io.micrometer.core.instrument.Timer$Sample@3b0b7fc2'
            ], 
            parentObservation=null
        }
    }
}>.
Most likely a manually created Observation has a scope opened that was never closed. This may lead to thread polluting and memory leaks.

When I connect my application to an Open Telemetry with a Jaeger backend, just to check if everything is ok in terms of trace and spans, it seems that yeah, everything is ok : Issue - Jaeger

This log is not generated if I remove interaction with my JPA repository. It's also not generated if I annotate the method of my controller (the one annotated with @QueryMapping) with @Observed, or if I set the property spring.jpa.open-in-view to false (something I prefer not to do).

How to reproduce

You can find a sample application right here.

You can query the controller with these information :

Additional comments

In the original ticket that I've created on Spring Boot project side, @bclozel added this comment, maybe it can give more info :

I'm not sure about the actual source of the problem, but I think that the JPA instrumentation assumes something that is invalid in this case here. First, a bit a background:

  • Spring for GraphQL instruments your controller methods by setting/restoring thread locals before/after calling it. Around this method execution, things can be executed asynchronously on various threads, so we don't rely on threadlocals but we store the current observation in the GraphQL context. I have checked that this behavior is correct.
  • when a controller method is executed, child observations can be started and they will be shown in the resulting traces

In this case, I think that a specific observation is started with the call to the JPA repository and that somehow this child observation is not closed before the GraphQL data fetching observation (your controller call) is ended. I think this observation is stopped later. While this probably won't cause leaks, the trace might look strange.

I suspect this is due to the datasource observation, keeping things opened as long as the JPA session is opened.

ttddyy commented 3 days ago

Thanks for the report.

I ran the code with ObservationTextPublisher, and here are the trimmed outputs for the observation lifecycle:

With spring.jpa.open-in-view=true (default)

START - name='http.server.requests'
 OPEN - name='http.server.requests'
START - name='graphql.request'
START - name='graphql.datafetcher'
 OPEN - name='graphql.datafetcher'
... my.issue.controllers.BooksController     : Trying to retrieve book with ID 'book-1'.
START - name='jdbc.connection'
 OPEN - name='jdbc.connection'
EVENT - CONNECTION_ACQUIRED
START - name='jdbc.query'
 OPEN - name='jdbc.query'
 STOP - name='jdbc.query'
CLOSE - name='jdbc.query'
START - name='jdbc.result-set'
 OPEN - name='jdbc.result-set'
 STOP - name='jdbc.result-set'
CLOSE - name='jdbc.result-set'
EVENT - CONNECTION_COMMIT
... Observation <name=http.server.requests(null),...    <=== WARNING
CLOSE - name='jdbc.connection'
 STOP - name='graphql.datafetcher'
 STOP - name='graphql.request'
 STOP - name='jdbc.connection'
CLOSE - name='jdbc.connection'
CLOSE - name='http.server.requests'
 STOP - name='http.server.requests'

With spring.jpa.open-in-view=false:

START - name='http.server.requests'
 OPEN - name='http.server.requests'
START - name='graphql.request'
START - name='graphql.datafetcher'
 OPEN - name='graphql.datafetcher'
... my.issue.controllers.BooksController     : Trying to retrieve book with ID 'book
START - name='jdbc.connection'
 OPEN - name='jdbc.connection'
EVENT - CONNECTION_ACQUIRED
START - name='jdbc.query'
 OPEN - name='jdbc.query'
 STOP - name='jdbc.query'
CLOSE - name='jdbc.query'
START - name='jdbc.result-set'
 OPEN - name='jdbc.result-set'
 STOP - name='jdbc.result-set'
CLOSE - name='jdbc.result-set'
EVENT - CONNECTION_COMMIT
 STOP - name='jdbc.connection'
CLOSE - name='jdbc.connection'
CLOSE - name='graphql.datafetcher'
 STOP - name='graphql.datafetcher'
 STOP - name='graphql.request'
CLOSE - name='http.server.requests'
 STOP - name='http.server.requests'

With spring.jpa.open-in-view=true (default), it delays closing the entity manager, which means the JDBC connection remains open for lazy-loading entities in the view.

As the output shows, observations start in the order of http.server.requests -> graphql.request -> graphql.datafetcher -> jdbc.connection -> ... The observations should stop in the reverse order, but due to the open session in view pattern, jdbc.connection stops after graphql.datafetcher and graphql.request.

I think this may be causing the ObservationThreadLocalAccessor to print out the WARN log. Though, I'm not sure what would be a solution here. At DataSource level observation is started/stopped when JDBC connection is acquired and closed. At spring controller level, an observation is started/closed based on the http request lifecycle. The open-session/entity-manger-in-view pattern kinds of breaking the way they are invoked.

@bclozel I also noticed from the output that the graphql.datafetcher observation is not CLOSED when open-in-view is enabled. Could it be related?

bclozel commented 3 days ago

@ttddyy I'm not familiar with the difference between CLOSE and STOP. From an instrumentation perspective, GraphQlObservationInstrumentation is always calling observation.stop() and the context decorator is always wrapping the execution as it should for context propagation.