mgravell / Pipelines.Sockets.Unofficial

.NET managed sockets wrapper using the new "Pipelines" API
Other
416 stars 54 forks source link

Linux performance problem #31

Closed patricksuo closed 5 years ago

patricksuo commented 5 years ago

echo program: https://github.com/sillyousu/TryoutPipelineSockets

Linux

supei@sandbox-dev-hk:~/frame_echo/frame_echo_client$ dotnet EchoClient.dll  
5000 clients, payload 64 bytes, 2 rounds
Total Time Elapsed: 18469.8927 Milliseconds
0 error of 5000
connect p90: 1,011.27ms  p95: 3,003.08ms    p99: 3,005.71ms  p99.9: 3,007.05ms
echo    p90: 113.15ms    p95: 288.96ms      p99: 3059.55ms   p99.9: 15394.67ms
total   p90: 1,257.19ms  p95: 3008.27ms     p99: 4809.88ms   p99.9: 18401.26ms

win:

~\source\dotnet\TryPipelineSockets\EchoClient> dotnet.exe .\bin\Release\netcoreapp2.2\EchoClient.dll
5000 clients, payload 64 bytes, 2 rounds
Total Time Elapsed: 1186.6827 Milliseconds
0 error of 5000
connect p90:9.31ms  p95:25.44ms p99:49.06ms     p99.9:500.18ms
echo    p90:61.87ms p95:70.42ms p99:95.84ms     p99.9:112.35ms
total   p90:75.67ms p95:95.45ms p99:137.40ms    p99.9:501.40ms

update: cleanup percentile table a little bit.

patricksuo commented 5 years ago

CSharp Client X rust Server

5000 clients, payload 64 bytes, 2 rounds
Total Time Elapsed: 1285.9418 Milliseconds
0 error of 5000
connect p90:13.27ms  p95:1,000.52ms p99:1,006.45ms p99.9:1,007.12ms
echo    p90:24.47ms  p95:35.437705ms p99:43.627724ms p99.9:47.5159049ms
total   p90:45.52ms  p95:1006.97928ms p99:1011.268413ms p99.9:1012.6754441ms
supei@sandbox-dev-hk:~/workspace/rust-echo$ cat src/main.rs 
extern crate tokio;

use std::net::SocketAddr;
use tokio::io::copy;
use tokio::net::TcpListener;
use tokio::prelude::*;

fn main() {
    // Bind the server's socket.
    let addr: SocketAddr = "127.0.0.1:12345".parse().unwrap();
    let listener = TcpListener::bind(&addr).expect("unable to bind TCP listener");

    // Pull out a stream of sockets for incoming connections
    let server = listener
        .incoming()
        .map_err(|e| eprintln!("accept failed = {:?}", e))
        .for_each(|sock| {
            // Split up the reading and writing parts of the
            // socket.
            let (reader, writer) = sock.split();

            // A future that echos the data and returns how
            // many bytes were copied...
            let bytes_copied = copy(reader, writer);

            // ... after which we'll print what happened.
            let handle_conn = bytes_copied
                .map(|amt| println!("wrote {:?} bytes", amt))
                .map_err(|err| eprintln!("IO error {:?}", err));

            // Spawn the future as a concurrent task.
            tokio::spawn(handle_conn)
        });

    // Start the Tokio runtime
    tokio::run(server);
}
patricksuo commented 5 years ago

Both server/client are running on the same machine.

I have not dig deep enough to have a theory. I will try out plain-old dotnet core TCP server + pipeline Client this weekend.

patricksuo commented 5 years ago

off CPU analysis is HARD :(

mgravell commented 5 years ago

OK, just so we're on the same page here - can you please be very explicit as to which values you're comparing and/or seeing a problem with? I can take guesses, but I'd really like to know that we're talking about the same things.

patricksuo commented 5 years ago

End-user care about overall response time. Let's talk about the 99th percentile. On windows, 99% client finish in 137.4ms. But on Linux, the p99 value is 4809.88 ms

sdanyliv commented 5 years ago

@sillyousu, can you share results about your Tryout project? Looking for good simple and effective TCP/IP server implementation.

patricksuo commented 5 years ago

@sdanyliv

In the first round testing, the benchmark scheme is pretty naive: both client and server are running on the same machine. So please ignore the figures above. (It did id a TCP listen socket backlog issue though)

I think effective implementation depends on workload a lot. My case is simulating massive new clients breaking out:

  1. a lot of new TCP connection coming in
  2. and transport layer handshake: exchange public key etc
  3. and business logic layer handshake: send login message and read response
  4. and become (relatively) idle

with some assumptions/constraint

I wrote a benchmark client in golang, here is the result

go client X PipelineServer
5000 clients, payload 256 bytes, 4 rounds
Total Time Elapsed: 1251.248651 Milliseconds
connect p90:111.65ms    p95:131.76ms    p99:159.71ms    p99.9:218.91ms
echo    p90:615.07ms    p95:617.51ms    p99:710.35ms    p99.9:804.46ms
total   p90:647.31ms    p95:652.15ms    p99:771.58ms    p99.9:899.14ms
go client X TcpSocketServer
5000 clients, payload 256 bytes, 4 rounds
Total Time Elapsed: 1214.614694 Milliseconds
connect p90:113.92ms    p95:131.91ms    p99:161.02ms    p99.9:218.46ms
echo    p90:391.37ms    p95:479.51ms    p99:543.37ms    p99.9:571.72ms
total   p90:452.04ms    p95:561.98ms    p99:602.10ms    p99.9:683.66ms
go client X go server
5000 clients, payload 256 bytes, 4 rounds
Total Time Elapsed: 1207.234204 Milliseconds
connect p90:103.75ms    p95:109.55ms    p99:145.01ms    p99.9:205.57ms
echo    p90:111.03ms    p95:118.50ms    p99:209.83ms    p99.9:210.44ms
total   p90:149.21ms    p95:181.59ms    p99:302.75ms    p99.9:318.41ms

Both combinations have similar throughput. The Golang implementation has the best responsiveness.

I think I can avoid the memory copy in the PipelineFrameProtocol.

It will be appreciated if anyone can review my tryout project and correct the wrong part. https://github.com/sillyousu/TryoutPipelineSockets

patricksuo commented 5 years ago

I'm also going to try out another case: massive simultaneous RPC requests in a server cluster.

sdanyliv commented 5 years ago

@sillyousu, I have tried your server implementation. There are bad news. I have run server in release mode (Windows) and run client application several times, one by one. After I approx four runs Server stops accept connections and refuses them. Also I have noticed a lot of catched ObjectDisposed exceptions. @mgravell, there are still a things to improve.

mgravell commented 5 years ago

@sdanyliv is the repro/repo up-to-date with what you've tried / where you are?

patricksuo commented 5 years ago

After I approx four runs Server stops accept connections and refuses them.

blind shot: could be running out of port? The client application is always the active close side, so there will be tons of TCP sockets in TIME_WAIT state.