spring-projects / spring-framework

Spring Framework
https://spring.io/projects/spring-framework
Apache License 2.0
56.58k stars 38.13k forks source link

SSE emitter completion callback not called after network error in 3.1.3 #31140

Closed Ironlink closed 12 months ago

Ironlink commented 1 year ago

In Spring Boot 3.1.3, the completion callback does not get called when an SSE emitter completes with a network error. This works as expected in Spring Boot 3.1.2.

In the reproducer below, I've added a call to SseEmitter::completeWithError in an (unsuccessful) attempt to work around this issue in 3.1.3. This call is not needed in 3.1.2.

pom.xml ```xml org.springframework.boot spring-boot-starter-parent 3.1.3 4.0.0 my-repro application 0.1 20 org.springframework.boot spring-boot-starter org.springframework.boot spring-boot-starter-web org.springframework.boot spring-boot-starter-test test org.apache.maven.plugins maven-compiler-plugin 3.11.0 ${java.version} org.apache.maven.plugins maven-dependency-plugin 3.6.0 org.apache.maven.plugins maven-jar-plugin 3.3.0 org.apache.maven.plugins maven-surefire-plugin 3.0.0 ```
ReproTest.java ```java package repro; import org.junit.jupiter.api.Test; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.boot.autoconfigure.SpringBootApplication; import org.springframework.boot.test.context.SpringBootTest; import org.springframework.boot.test.web.server.LocalServerPort; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Configuration; import org.springframework.scheduling.annotation.EnableScheduling; import org.springframework.scheduling.annotation.Scheduled; import org.springframework.web.bind.annotation.GetMapping; import org.springframework.web.bind.annotation.RestController; import org.springframework.web.servlet.mvc.method.annotation.SseEmitter; import java.io.IOException; import java.io.OutputStreamWriter; import java.net.InetSocketAddress; import java.net.Socket; import java.time.Duration; import java.util.HashSet; import java.util.Set; import java.util.concurrent.TimeUnit; import static org.springframework.boot.test.context.SpringBootTest.WebEnvironment.RANDOM_PORT; @SpringBootTest(webEnvironment = RANDOM_PORT, classes = ReproTest.class) @SpringBootApplication public class ReproTest { record WebListener(SseEmitter sseEmitter) { static WebListener create() { return new WebListener(new SseEmitter(0L)); } } static class EventSource { private final Set listeners = new HashSet<>(); void addListener(WebListener listener) { System.out.println("Adding listener"); listeners.add(listener); listener.sseEmitter().onCompletion(() -> { System.out.println("Removing listener"); listeners.remove(listener); }); } Set listeners() { return Set.copyOf(listeners); } @Scheduled(fixedRate = 2, timeUnit = TimeUnit.SECONDS) void sendEvents() { for (WebListener listener : listeners) { try { listener.sseEmitter().send(SseEmitter.event().name("PING")); } catch (IOException | RuntimeException e) { try { listener.sseEmitter().completeWithError(e); } catch (Exception ex) { ex.printStackTrace(); } e.printStackTrace(); } } } } @EnableScheduling @Configuration static class Config { @Bean EventSource eventSource() { return new EventSource(); } } @RestController static class Controller { @Autowired private EventSource eventSource; @GetMapping("/stream") public SseEmitter streamEvents() { WebListener listener = WebListener.create(); eventSource.addListener(listener); return listener.sseEmitter(); } } @LocalServerPort int port; @Autowired EventSource eventSource; @Test void listener_should_be_removed_after_network_error() throws IOException, InterruptedException { try (Socket socket = new Socket()) { socket.setReceiveBufferSize(1); socket.connect(new InetSocketAddress("localhost", port)); try (var writer = new OutputStreamWriter(socket.getOutputStream())) { writer.write(""" GET /stream Accept: text/event-stream """); writer.flush(); } Thread.sleep(1_000); } // socket.close() Thread.sleep(20_000); assert eventSource.listeners().isEmpty(); } } ```
knalli commented 1 year ago

We are also seeing this issue after upgrading to Spring Boot 3.1.3 and therefor its upgrade to Tomcat 10.1.12. We didn't notice the actual missing of the error handler, but these log messages:

java.lang.NullPointerException: Cannot invoke "org.apache.catalina.Context.decrementInProgressAsyncCount()" because "this.context" is null
    at org.apache.catalina.core.AsyncContextImpl.decrementInProgressAsyncCount(AsyncContextImpl.java:441)
    at org.apache.coyote.AsyncStateMachine.asyncPostProcess(AsyncStateMachine.java:295)
    at org.apache.coyote.AbstractProcessor.asyncPostProcess(AbstractProcessor.java:197)
    at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:78)
    at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:894)
    at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1740)
    at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:52)
    at org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1191)
    at org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659)
    at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
    at java.base/java.lang.Thread.run(Thread.java:833)

@Ironlink I have replayed your test case and can see exactly the same log messages.

After overriding/pinning the dependency to Tomcat to 10.1.11 manually (the version included in SB 3.1.2 which works), the test will run fine again, no log messages anymore.

Speaking of our environment (just for the records):

After playing with downgrading the different parties, I would rather go this is a bug in Tomcat 10.1.12+ and not in Spring or Spring Boot. Maybe someone else can confirm this?

tarioch commented 1 year ago

As far as I can tell, the onComplete is also not called anymore in our case and that is using Jetty.

tarioch commented 1 year ago

Did some more checking, looks like my case is also broken with 3.1.2, so maybe that's another issue (or based on an upgrade of jetty instead of tomcat)

cgoeller commented 1 year ago

I have the same issue with Spring Boot 3.1.3 and Tomcat. Downgrade to 3.1.2 fixes the problem.

The NullPointerException occurs when the SSE connection is closed in the browser using EventSource.close() Completion Callback is never executed.

bclozel commented 1 year ago

See https://bz.apache.org/bugzilla/show_bug.cgi?id=67235

gotson commented 1 year ago

See https://bz.apache.org/bugzilla/show_bug.cgi?id=67235

Seems tho was fixed in Tomcat 10.1.14. Hopefully this can be included in spring boot 3.2.0

gotson commented 1 year ago

Just tried with Spring Boot 3.1.5 which pulls Tomcat 10.1.15. but i still have the same issue.

mohmf commented 12 months ago

Seems tho was fixed in Tomcat 10.1.14. Hopefully this can be included in spring boot 3.2.0

Still the issue with Tomcat version 10.1.15, downgraded to 10.1.11 and the issue gone.

bclozel commented 12 months ago

If downgrading to a maintenance version of Tomcat fixes the problem, this points to a Tomcat issue. Please report this against the Tomcat project with minimal sample applications and stack traces. I'm closing this issue now. We can reopen this if it turns out Spring is involved in any way.

gotson commented 12 months ago

See https://bz.apache.org/bugzilla/show_bug.cgi?id=67235

Seems tho was fixed in Tomcat 10.1.14. Hopefully this can be included in spring boot 3.2.0

According to this it should be fixed, but the issue still happens in spring boot. I think it would be helpful if spring maintainers could advise whether the issue is still tomcat related with that latest version, or spring related.

bclozel commented 12 months ago

@gotson When using the repro project with Tomcat 10.1.12, I'm indeed seeing the following in the logs:

java.lang.NullPointerException: Cannot invoke "org.apache.catalina.Context.decrementInProgressAsyncCount()" because "this.context" is null
    at org.apache.catalina.core.AsyncContextImpl.decrementInProgressAsyncCount(AsyncContextImpl.java:441) ~[tomcat-embed-core-10.1.12.jar:10.1.12]
    at org.apache.coyote.AsyncStateMachine.asyncPostProcess(AsyncStateMachine.java:295) ~[tomcat-embed-core-10.1.12.jar:10.1.12]
    at org.apache.coyote.AbstractProcessor.asyncPostProcess(AbstractProcessor.java:197) ~[tomcat-embed-core-10.1.12.jar:10.1.12]
    at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:78) ~[tomcat-embed-core-10.1.12.jar:10.1.12]

With Spring Boot 3.1.5 (Tomcat 10.1.15), I'm not seeing those anymore. Fixing the repro case to also remove the listener in case of failures makes the repro project work fine to me. Are you saying you're still seeing this NPE in your application?

I'm not sure which bug you're referring to, maybe you are facing a different issue? Without a small application reproducing the problem, I can't really help here.

gotson commented 11 months ago

Fixing the repro case to also remove the listener in case of failures makes the repro project work fine to me

This is the bit i was missing it seems. All good now.

Ironlink commented 11 months ago

Fixing the repro case to also remove the listener in case of failures makes the repro project work fine to me

I disagree with using the word fixing here. The javadoc for SseEmitter::onCompletion says:

This method is called from a container thread when an async request completed for any reason including timeout and network error. This method is useful for detecting that a ResponseBodyEmitter instance is no longer usable.

As documented, I think the repro is corrrect. With Tomcat 10.1.15, this is clearly not true. As far as I can tell, this comes down to Tomcat no longer invoking jakarta.servlet.AsyncListener::onComplete after a network error. The contract for AsyncListener is not immediately clear to me, although it does seem like a regression on the Tomcat side.

bclozel commented 11 months ago

Thanks for the analysis @Ironlink - can you report this to the Tomcat project and share the link here?

Ironlink commented 11 months ago

Sorry, I forgot to update this issue. The fix for this will be included in the next version of Tomcat, 10.1.17.

https://bz.apache.org/bugzilla/show_bug.cgi?id=68227