http4s / blaze

Blazing fast NIO microframework and Http Parser
Apache License 2.0
351 stars 63 forks source link

Exceptions when benchmarking: ERROR NIO1SocketServerGroup #667

Open jan0sch opened 5 years ago

jan0sch commented 5 years ago

Hi,

during benchmarking a http4s service implementation I ran into some issues. Occasionally the service errored producing the following kinds of exceptions in the logs.

First kind

After this exception the service continued to reply but errors kept on ramping up.

ERROR NIO1SocketServerGroup - Error handling client channel. Closing.
java.util.concurrent.RejectedExecutionException: This SelectorLoop is closed.
        at org.http4s.blaze.channel.nio1.SelectorLoop.enqueueTask(SelectorLoop.scala:118)
        at org.http4s.blaze.channel.nio1.SelectorLoop.initChannel(SelectorLoop.scala:139)
        at org.http4s.blaze.channel.nio1.NIO1SocketServerGroup.org$http4s$blaze$channel$nio1$NIO1SocketServerGroup$$handleClientChannel(NIO1SocketServerGroup.scala:290)
        at org.http4s.blaze.channel.nio1.NIO1SocketServerGroup$SocketAcceptor.acceptNewConnections(NIO1SocketServerGroup.scala:148)
        at org.http4s.blaze.channel.nio1.NIO1SocketServerGroup$SocketAcceptor.opsReady(NIO1SocketServerGroup.scala:119)
        at org.http4s.blaze.channel.nio1.SelectorLoop.processKeys(SelectorLoop.scala:200)
        at org.http4s.blaze.channel.nio1.SelectorLoop.org$http4s$blaze$channel$nio1$SelectorLoop$$runLoop(SelectorLoop.scala:171)
        at org.http4s.blaze.channel.nio1.SelectorLoop$$anon$1.run(SelectorLoop.scala:68)
        at java.base/java.lang.Thread.run(Thread.java:834)

Second kind

After this exception the service stopped responding.

ERROR SelectorLoop - Unhandled exception in selector loop
java.io.IOException: Connection reset by peer
        at java.base/sun.nio.ch.FileDispatcherImpl.close0(Native Method)
        at java.base/sun.nio.ch.SocketDispatcher.close(SocketDispatcher.java:55)
        at java.base/sun.nio.ch.SocketChannelImpl.kill(SocketChannelImpl.java:907)
        at java.base/sun.nio.ch.SelectorImpl.processDeregisterQueue(SelectorImpl.java:267)
        at java.base/sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:116)
        at java.base/sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:124)
        at java.base/sun.nio.ch.SelectorImpl.select(SelectorImpl.java:141)
        at org.http4s.blaze.channel.nio1.SelectorLoop.org$http4s$blaze$channel$nio1$SelectorLoop$$runLoop(
SelectorLoop.scala:163)
        at org.http4s.blaze.channel.nio1.SelectorLoop$$anon$1.run(SelectorLoop.scala:68)
        at java.base/java.lang.Thread.run(Thread.java:834)
ERROR NIO1SocketServerGroup - Listening socket(/0.0.0.0:53248) closed forcibly.
java.nio.channels.ShutdownChannelGroupException: null
        at org.http4s.blaze.channel.nio1.SelectorLoop.killSelector(SelectorLoop.scala:225)
        at org.http4s.blaze.channel.nio1.SelectorLoop.org$http4s$blaze$channel$nio1$SelectorLoop$$runLoop(
SelectorLoop.scala:186)
        at org.http4s.blaze.channel.nio1.SelectorLoop$$anon$1.run(SelectorLoop.scala:68)
        at java.base/java.lang.Thread.run(Thread.java:834)
ERROR NIO1HeadStage - Abnormal NIO1HeadStage termination
java.nio.channels.ShutdownChannelGroupException: null
        at org.http4s.blaze.channel.nio1.SelectorLoop.killSelector(SelectorLoop.scala:225)
        at org.http4s.blaze.channel.nio1.SelectorLoop.org$http4s$blaze$channel$nio1$SelectorLoop$$runLoop(
SelectorLoop.scala:186)
        at org.http4s.blaze.channel.nio1.SelectorLoop$$anon$1.run(SelectorLoop.scala:68)
        at java.base/java.lang.Thread.run(Thread.java:834)

As mentioned these are hard to reproduce. In general I could observe that once a service started erroring then leaving the JVM running it would continue to produce errors.

System environment

The code for the service can be found in the following repository: https://github.com/jan0sch/pfhais

It is located within the pure folder. The configuration files for the jmeter benchmarks can be found in the jmeter folder.

Service workstation

Client workstation

Apache JMeter 5.1.1 was used to run the benchmark.

CharlesAHunt commented 3 years ago

@jan0sch @rossabaker I also have this issue with the exact same error message.

The error only occurs using Websockets and NIO1. Using NIO2 resolves the issue.

rossabaker commented 3 years ago

Unfortunately, the NIO2 server is deprecated in blaze. Performance was worse by all measures, and we didn't backport the CVE fix.

sergiojoker11 commented 2 years ago

As of 0.23.10 I guess there is no fix for this issue, is that right? What is the preferred workaround? Would you recommend switching to a different server impl?

jan0sch commented 2 years ago

The ember server is the new default one since some time now. I did not try to re-create the bug with the ember one but so far had no trouble with it (using it in production although only on light/moderate service loads).