spring-projects / spring-graphql

Spring Integration for GraphQL
https://spring.io/projects/spring-graphql
Apache License 2.0
1.53k stars 303 forks source link

NullPointerException with GraphQlSseHandler in case of async timeouts #1067

Closed jjavery closed 5 days ago

jjavery commented 6 days ago

I'm using graphql-sse to connect to a basic Spring GraphQL @SubscriptionMapping that returns a Flux<>. I'm seeing a Null Pointer Exception in the log every 30 seconds.

 reactor.core.publisher.Operators         : Operator called default onErrorDropped

java.lang.NullPointerException: Cannot invoke "org.apache.catalina.connector.OutputBuffer.isBlocking()" because "this.ob" is null
        at org.apache.catalina.connector.CoyoteOutputStream.checkNonBlockingWrite(CoyoteOutputStream.java:154) ~[tomcat-embed-core-10.1.30.jar:10.1.30]

Here's a Spring Initializr project demonstrating the problem:

https://github.com/jjavery/graphql-subscription-server-sent-events-npe

To reproduce: bootRun and open a browser to http://localhost:8080/ . Every 30 seconds you'll see the NPE in the log. Also there are unexpected warnings: "Ignoring exception, response committed already: org.springframework.web.context.request.async.AsyncRequestTimeoutException"

Includes the bug-fixed 1.3.3 version of GraphQlSseHandler.java copied from this commit: https://github.com/spring-projects/spring-graphql/commit/d3cd569e940479db7a2aab0a9b12aa81f1206ced

The 30 seconds comes from Tomcat's asyncTimeout. See TomcatConfiguration.java to change the asyncTimeout.

Log:

  .   ____          _            __ _ _
 /\\ / ___'_ __ _ _(_)_ __  __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
 \\/  ___)| |_)| | | | | || (_| |  ) ) ) )
  '  |____| .__|_| |_|_| |_\__, | / / / /
 =========|_|==============|___/=/_/_/_/

 :: Spring Boot ::                (v3.3.4)

2024-10-15T14:09:03.879-05:00  INFO 33774 --- [  restartedMain] com.example.demo.DemoApplication         : Starting DemoApplication using Java 17.0.12 with PID 33774 (/Users/jamie/Downloads/demo/bin/main started by jamie in /Users/jamie/Downloads/demo)
2024-10-15T14:09:03.881-05:00  INFO 33774 --- [  restartedMain] com.example.demo.DemoApplication         : No active profile set, falling back to 1 default profile: "default"
2024-10-15T14:09:03.909-05:00  INFO 33774 --- [  restartedMain] .e.DevToolsPropertyDefaultsPostProcessor : Devtools property defaults active! Set 'spring.devtools.add-properties' to 'false' to disable
2024-10-15T14:09:03.909-05:00  INFO 33774 --- [  restartedMain] .e.DevToolsPropertyDefaultsPostProcessor : For additional web related logging consider setting the 'logging.level.web' property to 'DEBUG'
2024-10-15T14:09:04.376-05:00  INFO 33774 --- [  restartedMain] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat initialized with port 8080 (http)
2024-10-15T14:09:04.384-05:00  INFO 33774 --- [  restartedMain] o.apache.catalina.core.StandardService   : Starting service [Tomcat]
2024-10-15T14:09:04.384-05:00  INFO 33774 --- [  restartedMain] o.apache.catalina.core.StandardEngine    : Starting Servlet engine: [Apache Tomcat/10.1.30]
2024-10-15T14:09:04.404-05:00  INFO 33774 --- [  restartedMain] o.a.c.c.C.[Tomcat].[localhost].[/]       : Initializing Spring embedded WebApplicationContext
2024-10-15T14:09:04.405-05:00  INFO 33774 --- [  restartedMain] w.s.c.ServletWebServerApplicationContext : Root WebApplicationContext: initialization completed in 495 ms
2024-10-15T14:09:04.557-05:00  INFO 33774 --- [  restartedMain] efaultSchemaResourceGraphQlSourceBuilder : Loaded 1 resource(s) in the GraphQL schema.
2024-10-15T14:09:04.650-05:00  INFO 33774 --- [  restartedMain] o.s.b.a.g.GraphQlAutoConfiguration       : GraphQL schema inspection:
        Unmapped fields: {}
        Unmapped registrations: {}
        Unmapped arguments: {}
        Skipped types: []
2024-10-15T14:09:04.681-05:00  INFO 33774 --- [  restartedMain] o.s.b.a.w.s.WelcomePageHandlerMapping    : Adding welcome page: class path resource [static/index.html]
2024-10-15T14:09:04.725-05:00  INFO 33774 --- [  restartedMain] s.b.a.g.s.GraphQlWebMvcAutoConfiguration : GraphQL endpoint HTTP POST /graphql
2024-10-15T14:09:04.779-05:00  WARN 33774 --- [  restartedMain] o.s.b.d.a.OptionalLiveReloadServer       : Unable to start LiveReload server
2024-10-15T14:09:04.795-05:00  INFO 33774 --- [  restartedMain] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat started on port 8080 (http) with context path '/'
2024-10-15T14:09:04.800-05:00  INFO 33774 --- [  restartedMain] com.example.demo.DemoApplication         : Started DemoApplication in 1.118 seconds (process running for 1.354)
2024-10-15T14:09:09.366-05:00  INFO 33774 --- [nio-8080-exec-2] o.a.c.c.C.[Tomcat].[localhost].[/]       : Initializing Spring DispatcherServlet 'dispatcherServlet'
2024-10-15T14:09:09.366-05:00  INFO 33774 --- [nio-8080-exec-2] o.s.web.servlet.DispatcherServlet        : Initializing Servlet 'dispatcherServlet'
2024-10-15T14:09:09.367-05:00  INFO 33774 --- [nio-8080-exec-2] o.s.web.servlet.DispatcherServlet        : Completed initialization in 1 ms
2024-10-15T14:09:39.809-05:00  WARN 33774 --- [nio-8080-exec-3] .w.s.m.s.DefaultHandlerExceptionResolver : Ignoring exception, response committed already: org.springframework.web.context.request.async.AsyncRequestTimeoutException
2024-10-15T14:09:39.810-05:00  WARN 33774 --- [nio-8080-exec-3] .w.s.m.s.DefaultHandlerExceptionResolver : Resolved [org.springframework.web.context.request.async.AsyncRequestTimeoutException]
2024-10-15T14:09:40.481-05:00 ERROR 33774 --- [     parallel-1] reactor.core.publisher.Operators         : Operator called default onErrorDropped

java.lang.NullPointerException: Cannot invoke "org.apache.catalina.connector.OutputBuffer.isBlocking()" because "this.ob" is null
        at org.apache.catalina.connector.CoyoteOutputStream.checkNonBlockingWrite(CoyoteOutputStream.java:154) ~[tomcat-embed-core-10.1.30.jar:10.1.30]
        at org.apache.catalina.connector.CoyoteOutputStream.write(CoyoteOutputStream.java:101) ~[tomcat-embed-core-10.1.30.jar:10.1.30]
        at org.springframework.web.context.request.async.StandardServletAsyncWebRequest$LifecycleServletOutputStream.write(StandardServletAsyncWebRequest.java:398) ~[spring-web-6.1.13.jar:6.1.13]
        at java.base/java.io.OutputStream.write(OutputStream.java:127) ~[na:na]
        at org.springframework.web.servlet.function.SseServerResponse$DefaultSseBuilder.send(SseServerResponse.java:143) ~[spring-webmvc-6.1.13.jar:6.1.13]
        at org.springframework.web.servlet.function.SseServerResponse$DefaultSseBuilder.writeString(SseServerResponse.java:206) ~[spring-webmvc-6.1.13.jar:6.1.13]
        at org.springframework.web.servlet.function.SseServerResponse$DefaultSseBuilder.data(SseServerResponse.java:194) ~[spring-webmvc-6.1.13.jar:6.1.13]
        at com.example.demo.GraphQlSseHandler133$SseSubscriber.hookOnComplete(GraphQlSseHandler133.java:135) ~[main/:na]
        at com.example.demo.GraphQlSseHandler133$SseSubscriber.hookOnError(GraphQlSseHandler133.java:129) ~[main/:na]
        at reactor.core.publisher.BaseSubscriber.onError(BaseSubscriber.java:180) ~[reactor-core-3.6.10.jar:3.6.10]
        at reactor.core.publisher.BaseSubscriber.onNext(BaseSubscriber.java:163) ~[reactor-core-3.6.10.jar:3.6.10]
        at reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner.onNext(MonoFlatMapMany.java:251) ~[reactor-core-3.6.10.jar:3.6.10]
        at reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:122) ~[reactor-core-3.6.10.jar:3.6.10]
        at graphql.execution.reactive.CompletionStageMappingPublisher$CompletionStageSubscriber.lambda$whenNextFinished$0(CompletionStageMappingPublisher.java:100) ~[graphql-java-22.1.jar:na]
        at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) ~[na:na]
        at java.base/java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:887) ~[na:na]
        at java.base/java.util.concurrent.CompletableFuture.whenComplete(CompletableFuture.java:2325) ~[na:na]
        at java.base/java.util.concurrent.CompletableFuture.whenComplete(CompletableFuture.java:144) ~[na:na]
        at graphql.execution.reactive.CompletionStageMappingPublisher$CompletionStageSubscriber.onNext(CompletionStageMappingPublisher.java:88) ~[graphql-java-22.1.jar:na]
        at reactor.core.publisher.StrictSubscriber.onNext(StrictSubscriber.java:89) ~[reactor-core-3.6.10.jar:3.6.10]
        at reactor.core.publisher.FluxContextWrite$ContextWriteSubscriber.onNext(FluxContextWrite.java:107) ~[reactor-core-3.6.10.jar:3.6.10]
        at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onNext(FluxOnErrorResume.java:79) ~[reactor-core-3.6.10.jar:3.6.10]
        at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onNext(FluxOnErrorResume.java:79) ~[reactor-core-3.6.10.jar:3.6.10]
        at reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:122) ~[reactor-core-3.6.10.jar:3.6.10]
        at reactor.core.publisher.FluxInterval$IntervalRunnable.run(FluxInterval.java:124) ~[reactor-core-3.6.10.jar:3.6.10]
        at reactor.core.scheduler.PeriodicWorkerTask.call(PeriodicWorkerTask.java:59) ~[reactor-core-3.6.10.jar:3.6.10]
        at reactor.core.scheduler.PeriodicWorkerTask.run(PeriodicWorkerTask.java:73) ~[reactor-core-3.6.10.jar:3.6.10]
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[na:na]
        at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) ~[na:na]
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) ~[na:na]
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[na:na]
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[na:na]
        at java.base/java.lang.Thread.run(Thread.java:840) ~[na:na]
        Suppressed: java.lang.NullPointerException: Cannot invoke "org.apache.catalina.connector.OutputBuffer.isBlocking()" because "this.ob" is null
                at org.apache.catalina.connector.CoyoteOutputStream.checkNonBlockingWrite(CoyoteOutputStream.java:154) ~[tomcat-embed-core-10.1.30.jar:10.1.30]
                at org.apache.catalina.connector.CoyoteOutputStream.write(CoyoteOutputStream.java:101) ~[tomcat-embed-core-10.1.30.jar:10.1.30]
                at org.springframework.web.context.request.async.StandardServletAsyncWebRequest$LifecycleServletOutputStream.write(StandardServletAsyncWebRequest.java:398) ~[spring-web-6.1.13.jar:6.1.13]
                at java.base/java.io.OutputStream.write(OutputStream.java:127) ~[na:na]
                at org.springframework.web.servlet.function.SseServerResponse$DefaultSseBuilder.writeObject(SseServerResponse.java:213) ~[spring-webmvc-6.1.13.jar:6.1.13]
                at org.springframework.web.servlet.function.SseServerResponse$DefaultSseBuilder.data(SseServerResponse.java:197) ~[spring-webmvc-6.1.13.jar:6.1.13]
                at com.example.demo.GraphQlSseHandler133$SseSubscriber.writeResult(GraphQlSseHandler133.java:112) ~[main/:na]
                at com.example.demo.GraphQlSseHandler133$SseSubscriber.hookOnNext(GraphQlSseHandler133.java:106) ~[main/:na]
                at com.example.demo.GraphQlSseHandler133$SseSubscriber.hookOnNext(GraphQlSseHandler133.java:1) ~[main/:na]
                at reactor.core.publisher.BaseSubscriber.onNext(BaseSubscriber.java:160) ~[reactor-core-3.6.10.jar:3.6.10]
                ... 22 common frames omitted
jjavery commented 5 days ago

Also there are some other exceptions in the log when the client disconnects e.g. browser navigates away from the page.

bclozel commented 5 days ago

Thanks for reaching out @jjavery

I'm reproducing this behavior and I had a look. I think there are several things to be considered here:

  1. the call that causes the NPE to be raised
  2. depending on the error/timeout case, whether we should complete the event stream or not
  3. how we should deal with the request async timeout, including the default value and possible customizations

I think that we can solve 1) and 2) in this issue, improving the current GraphQlSseHandler implementation.

As for 3), I'm wondering what would be your expectations there. I saw that you prepared a tomcat configuration to extend the default async response timeout. Would you expect this to be configurable on the SSE handler directly for all subscriptions, without changing the application-wide default? We could tackle that in a different issue.

bclozel commented 5 days ago

I think I took care of both 1) and 2) with the change above. I'll wait for your feedback @jjavery before considering 3).

You can test this change with your repro application by adding the following to your build.gradle:

ext['spring-graphql.version'] = "1.3.3-SNAPSHOT"

repositories {
    mavenCentral()
    maven { url 'https://repo.spring.io/milestone' }
    maven { url 'https://repo.spring.io/snapshot' }
}

You'll need of course to completely remove your GraphQlSseHandler133 class.

jjavery commented 4 days ago

@bclozel Thanks! This is fantastic.

For 3), I don't know if this is possible (or advisable—I haven't put too much thought into it), but my preference is for subscriptions to have no timeout by default, with the ability to configure the timeout per each subscription. If that's not possible then yes, it could be configurable on the SSE handler for all subscriptions.

I'm still seeing some warnings in the logs for each async response timeout:

2024-10-17T13:23:22.023-05:00  WARN 62828 --- [nio-8080-exec-6] .w.s.m.s.DefaultHandlerExceptionResolver : Ignoring exception, response committed already: org.springframework.web.context.request.async.AsyncRequestTimeoutException
2024-10-17T13:23:22.024-05:00  WARN 62828 --- [nio-8080-exec-6] .w.s.m.s.DefaultHandlerExceptionResolver : Resolved [org.springframework.web.context.request.async.AsyncRequestTimeoutException]

And exceptions when the client disconnects:

2024-10-17T13:38:18.597-05:00 ERROR 62828 --- [nio-8080-exec-3] o.a.c.c.C.[.[.[/].[dispatcherServlet]    : Servlet.service() for servlet [dispatcherServlet] threw exception

java.io.IOException: Broken pipe
        at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method) ~[na:na]
        <...stack trace here...>
        at java.base/java.lang.Thread.run(Thread.java:840) ~[na:na]

2024-10-17T13:38:18.601-05:00 ERROR 62828 --- [nio-8080-exec-3] o.a.c.c.C.[.[.[/].[dispatcherServlet]    : Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception

java.io.IOException: Broken pipe
        at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method) ~[na:na]
        <...stack trace here...>
        at java.base/java.lang.Thread.run(Thread.java:840) ~[na:na]

Should I create separate issues for these?

bclozel commented 4 days ago

Thanks for your feedback. I will discuss our timeout options with the rest of the team.

As for the other logs:

No need to create a new issue for now.

rstoyanchev commented 3 days ago

We've had an initial discussion.

In the SSE handler, the operation with the subscription is just text that is not yet parsed by the GraphQL engine. There is no straight forward way to differentiate subscriptions. For the time being, we can expose a timeout on the SSE handler, and that should be easy to set with a property in Boot. I've created #1079.

I don't know if by "no timeout by default" you mean -1 (i.e.never timeout), or no timeout set (i.e. leave it to the underlying server, 30 seconds). Arguably 30 seconds is too short for an SSE stream, we could set that to 5 minutes perhaps.

For AsyncRequestTimeoutException, we can consider the same as what we did for AsyncRequestNotUsableException in https://github.com/spring-projects/spring-framework/issues/33225.

For the Broken Pipe, the exception seems to be logged by Tomcat there. We can't control that, but we'll check if it is easy to reproduce to understand why the exception remains not handled.

rstoyanchev commented 21 hours ago

I've had a look at the "Broken pipe" errors. Tomcat notifies us of the IOException that happened while writing, but as there is no error handling for that, it remains unhandled, and eventually logged by Tomcat.

I've created https://github.com/spring-projects/spring-framework/issues/33763 to improve that. In the mean you can add exception handling. For example in the demo:

@SpringBootApplication
public class DemoApplication {

    public static void main(String[] args) {
        SpringApplication.run(DemoApplication.class, args);
    }

    @Bean
    WebMvcRegistrations webMvcRegistrations() {
        return new WebMvcRegistrations() {
            @Override
            public ExceptionHandlerExceptionResolver getExceptionHandlerExceptionResolver() {
                // Allow use of @ControllerAdvice for functional endpoints
                ExceptionHandlerExceptionResolver resolver = new ExceptionHandlerExceptionResolver();
                resolver.setMappedHandlerPredicate(handler -> true);
                return resolver;
            }
        };
    }

    @ControllerAdvice
    public static class GlobalExceptionHandler {

        @ExceptionHandler
        public Object handle(IOException ex) {
            return (!DisconnectedClientHelper.isClientDisconnectedException(ex) ?
                    ResponseEntity.internalServerError().build() : null);
        }

    }
}

For the default value of timeouts, we'are going to leave that without any setting in spring-graphql, but I've created https://github.com/spring-projects/spring-boot/issues/42807 where Boot can set some default.