mock-server / mockserver

MockServer enables easy mocking of any system you integrate with via HTTP or HTTPS with clients written in Java, JavaScript and Ruby. MockServer also includes a proxy that introspects all proxied traffic including encrypted SSL traffic and supports Port Forwarding, Web Proxying (i.e. HTTP proxy), HTTPS Tunneling Proxying (using HTTP CONNECT) and SOCKS Proxying (i.e. dynamic port forwarding).
http://mock-server.com
Apache License 2.0
4.57k stars 1.07k forks source link

Resource over-consumption and "too many open files" #560

Closed hmil closed 5 years ago

hmil commented 5 years ago

Hi,

We are using mockServer for a few unit tests inside a huge test suite. We started seeing "Too many open files" errors pop up consistently in our CI environment.

Symptoms

Our test suite is constructed like so:

private ClientAndServer mockServer = startClientAndServer(port);

@Before
public void setup() {
    mockServer.reset();
    mockServer.when(
                HttpRequest.request()
                        .withPath(...)
                        .withMethod("GET"))
                .respond(request -> { /* some handler */ });
    mockServer.when(...); // Repeated a few times for each mocked endpoint
}

@After
public void teardown() {
    mockServer.stop()
}

The funny thing is how the test suite fails: Out of a little less than 10 test cases, first a bunch of tests pass (like 5 or so), then one fails with the stacktrace below, then a couple more tests pass, and then another one fails.

If this bug was due to a hard leak, then we would see a few tests pass, then one fail, and then all of the tests following it would fail as well. Instead, we observe that some successful tests run after a failing test, which shows that some resources were freed in the meantime. It must therefore be that resources are over-utilized, but probably don't actually leak.

Investigation

So I looked at the internals of mock-server to find out where this over-utilization came from, and I found something strange: In MockServerClient:#when you create a new ForwardChainExpectation. This expectation has a lazy WebSocketClient which gets initialized on the call to respond. This WebSocketClient in turn has a per-instance EventLoopGroup. This is the culprit of this over-consumption of resources. I believe that NioEventLoopGroup is meant to be created once at server startup and deleted upon shutdown. By calling shutdownGracefully, the WebSocketClient allows allocated resources to stay open for a bit longer (a few seconds according to netty's documentation). In an intensive environment, such as a unit test suite, this can quickly pile into a large amount of allocated resources.

Mitigation

Why does each ForwardChainExpectation have its own WebSocketClient, and why does each WebSocketClient have its own NioEventLoopGroup? Couldn't some of this resources be shared for a more efficient solution? In particular, wouldn't it be better to have WebSocketClient#group be supplied by the mock server itself, such that the lifecycle of the EventLoopGroup is the same as that of the mock server instance? Alternatively, did I misuse mockServer in my test suite (ie: should set it up once for all test cases, instead of resetting it each time)?

We are going to mitigate this issue on our side by bumping the system resources of our CI agents, but I feel like fundamentally this bug arises because of an abuse of the system resources due to the way mock-server behaves in unit test suites.


Here is the reference stacktrace:

java.lang.IllegalStateException: failed to create a child event loop
    at io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:88)
    at io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:58)
    at io.netty.channel.MultithreadEventLoopGroup.<init>(MultithreadEventLoopGroup.java:52)
    at io.netty.channel.nio.NioEventLoopGroup.<init>(NioEventLoopGroup.java:87)
    at io.netty.channel.nio.NioEventLoopGroup.<init>(NioEventLoopGroup.java:82)
    at io.netty.channel.nio.NioEventLoopGroup.<init>(NioEventLoopGroup.java:63)
    at io.netty.channel.nio.NioEventLoopGroup.<init>(NioEventLoopGroup.java:51)
    at io.netty.channel.nio.NioEventLoopGroup.<init>(NioEventLoopGroup.java:43)
    at org.mockserver.client.netty.websocket.WebSocketClient.<init>(WebSocketClient.java:32)
    at org.mockserver.client.ForwardChainExpectation.respond(ForwardChainExpectation.java:70)
    ... much more
Caused by: io.netty.channel.ChannelException: failed to open a new selector
    at io.netty.channel.nio.NioEventLoop.openSelector(NioEventLoop.java:175)
    at io.netty.channel.nio.NioEventLoop.<init>(NioEventLoop.java:149)
    at io.netty.channel.nio.NioEventLoopGroup.newChild(NioEventLoopGroup.java:127)
    at io.netty.channel.nio.NioEventLoopGroup.newChild(NioEventLoopGroup.java:36)
    at io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:84)
    ... 59 more
Caused by: java.io.IOException: Too many open files
    at sun.nio.ch.IOUtil.makePipe(Native Method)
    at sun.nio.ch.EPollSelectorImpl.<init>(EPollSelectorImpl.java:65)
    at sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:36)
    at io.netty.channel.nio.NioEventLoop.openSelector(NioEventLoop.java:173)
    ... 63 more
jamesdbloom commented 5 years ago

I agree and I can re-use the NioEventLoopGroup from the MockServerClient (i.e. NettyHttpClient). It may also be possible to re-use the same WebSocketClient for multiple expectations by increasing the multiplexing, although if I remember this was how it was initially implemented and there where issues with numerous requests and responses passing over the same WebSocket due to poor pipelining. From memory that is why there was a separate NioEventLoopGroup per client, because there was originally only a single client so this didn't matter.

So, initially I can share the NioEventLoopGroup, once that is working I'll see if it is possible to share the WebSocketClient as well.

jamesdbloom commented 5 years ago

It is not feasible to have a single WebSocketClient because it is too complex on the client side to support this design across multiple clients in multiple languages. I have however shared the NioEventLoopGroup between the NettyHttpClient and the WebSocketClient this should resolve the issue because there is now only a single NioEventLoopGroup created per client.

hmil commented 5 years ago

Awesome, thanks for solving this so fast!