netty / netty-incubator-transport-io_uring

Apache License 2.0
194 stars 38 forks source link

Why do we test the official benchmark results showing no difference between iouring and nio? #106

Open lwglgy opened 3 years ago

lwglgy commented 3 years ago

We used the official test program, but the results we got were almost no difference between iouring and nio. We use the official benchmark program, our test host has 32 cores, 200G RAM and the kernel version is 5.13.0

The following are our test codes. `

import io.netty.bootstrap.ServerBootstrap; import io.netty.channel.*; import io.netty.channel.nio.NioEventLoopGroup; import io.netty.channel.socket.SocketChannel; import io.netty.channel.socket.nio.NioServerSocketChannel;

public class EchoNioServer { private static final int PORT = Integer.parseInt(System.getProperty("port", "8088"));

public static void main(String []args) {
    System.out.println("start Nio server");
    EventLoopGroup group = new NioEventLoopGroup();
    final EchoServerHandler serverHandler = new EchoServerHandler();
    //boss用来接收进来的连接
    EventLoopGroup bossGroup = new NioEventLoopGroup();
    //用来处理已经被接收的连接;
    EventLoopGroup workerGroup = new NioEventLoopGroup();
    try {
        ServerBootstrap b = new ServerBootstrap();
        b.group(bossGroup, workerGroup)
                .option(ChannelOption.SO_REUSEADDR, true)
                .channel(NioServerSocketChannel.class)
                .childHandler(new ChannelInitializer<SocketChannel>() {
                    @Override
                    public void initChannel(SocketChannel ch) throws Exception {
                        ChannelPipeline p = ch.pipeline();
                        //p.addLast(new LoggingHandler(LogLevel.INFO));
                        p.addLast(serverHandler);
                    }
                });

        // Start the server.
        ChannelFuture f = b.bind(PORT).sync();

        // Wait until the server socket is closed.
        f.channel().closeFuture().sync();
    } catch (InterruptedException e) {
        e.printStackTrace();
    } finally {
        // Shut down all event loops to terminate all threads.
        group.shutdownGracefully();
        workerGroup.shutdownGracefully();
        bossGroup.shutdownGracefully();
    }
}

}

import io.netty.bootstrap.ServerBootstrap; import io.netty.channel.*; import io.netty.channel.socket.SocketChannel; import io.netty.incubator.channel.uring.IOUringEventLoopGroup; import io.netty.incubator.channel.uring.IOUringServerSocketChannel;

// This is using io_uring public class EchoIOUringServer { private static final int PORT = Integer.parseInt(System.getProperty("port", "8081"));

public static void main(String []args) {
    System.out.println("start iouring server");
    EventLoopGroup group = new IOUringEventLoopGroup();
    final EchoServerHandler serverHandler = new EchoServerHandler();
    //boss用来接收进来的连接
    EventLoopGroup bossGroup = new IOUringEventLoopGroup();
    //用来处理已经被接收的连接;
    EventLoopGroup workerGroup = new IOUringEventLoopGroup();

    try {
        ServerBootstrap b = new ServerBootstrap();
        b.group(bossGroup, workerGroup)
                .option(ChannelOption.SO_REUSEADDR, true)
                .channel(IOUringServerSocketChannel.class)
                .childHandler(new ChannelInitializer<SocketChannel>() {
                    @Override
                    public void initChannel(SocketChannel ch) throws Exception {
                        ChannelPipeline p = ch.pipeline();
                        //p.addLast(new LoggingHandler(LogLevel.INFO));
                        p.addLast(serverHandler);
                    }
                });

        // Start the server.
        ChannelFuture f = b.bind(PORT).sync();

        // Wait until the server socket is closed.
        f.channel().closeFuture().sync();
    } catch (InterruptedException e) {
        e.printStackTrace();
    } finally {
        // Shut down all event loops to terminate all threads.
        group.shutdownGracefully();
        workerGroup.shutdownGracefully();
        bossGroup.shutdownGracefully();
    }
}

}

import io.netty.channel.ChannelHandler; import io.netty.channel.ChannelHandlerContext; import io.netty.channel.ChannelInboundHandlerAdapter;

@ChannelHandler.Sharable public class EchoServerHandler extends ChannelInboundHandlerAdapter {

@Override
public void channelRead(ChannelHandlerContext ctx, Object msg) {
    ctx.write(msg);
}

@Override
public void channelReadComplete(ChannelHandlerContext ctx) {
    ctx.flush();
}

@Override
public void exceptionCaught(ChannelHandlerContext ctx, Throwable cause) {
    // Close the connection when an exception is raised.
    ctx.close();
}

@Override
public void channelWritabilityChanged(ChannelHandlerContext ctx) throws Exception {
    // Ensure we are not writing to fast by stop reading if we can not flush out data fast enough.
    if (ctx.channel().isWritable()) {
        ctx.channel().config().setAutoRead(true);
    } else {
        ctx.flush();
        if (!ctx.channel().isWritable()) {
            ctx.channel().config().setAutoRead(false);
        }
    }
}

}

`

Above is our test procedure, which is from the benchmark on the official website, but there is almost no difference in the results we measured.

image

normanmaurer commented 3 years ago

Is this on a real machine or on a vm ? Also can you just use new IOUringEventLoopGroup(1) and new NioEventLoopGroup(1) ?

HowHsu commented 3 years ago

Hi Norman, We tried again with *EventLoopGroup(1), it works! io_uring is about 5 times faster than nio and epoll in terms of packet rate. But meanwhile we found that the io_uring server costs 1300% cpu while nio/epoll costs just 100%. We then found it's caused by io-wq threads. So I used a very big ioSqeAsyncThreshold like 50000000 to do the test again, this time io_uring cost just 100% too, the packet rate is similar with (even a little bit worse than) the nio/epoll one

franz1981 commented 2 years ago

Given that IOSQE_ASYNC can have a very unexpected behaviour eg issueing an unbounded number of threads to async complete requests (if not in some recent-ish kernel patch IIRC) maybe would be better to let the number of handled fds before using IOSQE_ASYNC unbounded by default, limiting only if necessary, wdyt @normanmaurer ?