The main improvement in this PR is a change in the sizing of Netty's EvenLoopGroup. By default, Netty will size the EventLoopGroup at nCPU * 2, and will use this pool for 2 purposes:
Accepting new connections (boss)
Running the pipeline and IO (worker)
The sizing of this pool is extremely conservative and assumes that some blocking IO might occur in Netty's worker threads. It also guards against cases where the underlying transport is blocking. However, since we're delegating all user-defined logic to ZIO's executor, and since the transports we're using are all non-blocking (Epoll, KQueue, NIO), a better configuration is to:
Use a separate EventLoopGroup with nThreads=1 for the boss group. This way, we avoid delays in accepting new connections when the worker threads are busy serving existing connections
Size the worker EventLoopGroup with nThreads=nCPU since there's no blocking code running in the worker threads
With this change, we see a ~10% increase in throughput vs the default configuration
Use ctx.channel.write instead of ctx.write in ServerInboundHandler
This part is not really well documented in Netty, but their main differences are
ctx.write / ctx.writeAndFlush will walk through the pipeline from the current handler until the head of the pipeline when writing a response
ctx.channel.write / ctx.channel.writeAndFlush will start at the end of the pipeline and walk through all the handlers in the pipeline when writing a response
In our case, short-cutting writes to the pipeline by starting at the current handler doesn't bring any benefit because the ServerInboundHandler is the last one in the pipeline. In addition, as this video on Netty's best practises seems to suggest, when we're writing to the channel from a different thread it's better to use ctx.channel.write.
Other changes
removed the runtime scope from epoll / kqueue native transports. These are generally tiny (~20kb each) and I think it's better for users to have them available by default rather than having to install them manually and having to align netty versions
updated the ./.devcontainer files to install wrk and updated the Java / SBT versions in order to make it quicker to spin up a devcontainer for benchmarking purposes
updated build.sbt to fork processes when using sbt zioHttpExample/runMain example.xx to avoid having issues with restarting servers
Change to default Netty configuration
The main improvement in this PR is a change in the sizing of Netty's EvenLoopGroup. By default, Netty will size the EventLoopGroup at
nCPU * 2
, and will use this pool for 2 purposes:The sizing of this pool is extremely conservative and assumes that some blocking IO might occur in Netty's worker threads. It also guards against cases where the underlying transport is blocking. However, since we're delegating all user-defined logic to ZIO's executor, and since the transports we're using are all non-blocking (Epoll, KQueue, NIO), a better configuration is to:
nThreads=1
for the boss group. This way, we avoid delays in accepting new connections when the worker threads are busy serving existing connectionsnThreads=nCPU
since there's no blocking code running in the worker threadsWith this change, we see a ~10% increase in throughput vs the default configuration
Use
ctx.channel.write
instead ofctx.write
in ServerInboundHandlerThis part is not really well documented in Netty, but their main differences are
ctx.write
/ctx.writeAndFlush
will walk through the pipeline from the current handler until the head of the pipeline when writing a responsectx.channel.write
/ctx.channel.writeAndFlush
will start at the end of the pipeline and walk through all the handlers in the pipeline when writing a responseIn our case, short-cutting writes to the pipeline by starting at the current handler doesn't bring any benefit because the ServerInboundHandler is the last one in the pipeline. In addition, as this video on Netty's best practises seems to suggest, when we're writing to the channel from a different thread it's better to use
ctx.channel.write
.Other changes
runtime
scope from epoll / kqueue native transports. These are generally tiny (~20kb each) and I think it's better for users to have them available by default rather than having to install them manually and having to align netty versions./.devcontainer
files to installwrk
and updated the Java / SBT versions in order to make it quicker to spin up a devcontainer for benchmarking purposesbuild.sbt
to fork processes when usingsbt zioHttpExample/runMain example.xx
to avoid having issues with restarting servers