snapview / tokio-tungstenite

Future-based Tungstenite for Tokio. Lightweight stream-based WebSocket implementation
MIT License
1.87k stars 236 forks source link

Performance problem with current thread runtime and large number of connections #247

Closed ry closed 1 year ago

ry commented 1 year ago
Screenshot 2022-12-07 at 7 57 33 AM

We are comparing echo server throughput with uWebSocket with the current_thread runtime, 256 connections, and seeing a notable 30% throughput difference.

Here's the source code we're using:

use std::{env, io::Error};

use futures_util::SinkExt;
use futures_util::{future, StreamExt, TryStreamExt};
use tokio::net::{TcpListener, TcpStream};
use tokio_tungstenite::tungstenite::Message;

#[tokio::main(flavor = "current_thread")]
async fn main() -> Result<(), Error> {
    let addr = env::args()
        .nth(1)
        .unwrap_or_else(|| "127.0.0.1:7001".to_string());

    // Create the event loop and TCP listener we'll accept connections on.
    let try_socket = TcpListener::bind(&addr).await;
    let listener = try_socket.expect("Failed to bind");

    while let Ok((stream, _)) = listener.accept().await {
        tokio::spawn(accept_connection(stream));
    }

    Ok(())
}

// Sent 115255 messages in 1 sec, throughput: 576275 bytes/sec
async fn accept_connection(stream: TcpStream) {
    let ws_stream = tokio_tungstenite::accept_async(stream)
        .await
        .expect("Error during the websocket handshake occurred");

    let (mut write, mut read) = ws_stream.split();
    loop {
        let msg = read.next().await.unwrap().unwrap();
        match msg {
            Message::Text(text) => {
                write.send(Message::Text(text)).await.unwrap();
            }
            Message::Binary(bin) => {
                write.send(Message::Binary(bin)).await.unwrap();
            }
            _ => {
                continue;
            }
        }
    }
}

We're comparing it to https://github.com/uNetworking/uWebSockets/blob/45e9ca2372a3758ece3384ac52797da3a3c8fa48/examples/EchoServer.cpp

And we're using this tool to benchmark it: https://github.com/uNetworking/uWebSockets/blob/45e9ca2372a3758ece3384ac52797da3a3c8fa48/benchmarks/load_test.c

Any ideas what is causing this suboptimal performance?

daniel-abramov commented 1 year ago

Hmm, this would require some investigation on our side. It used to be the fastest library at the time we initially released it as far as I can remember, but things might have changed in the meantime. Or, maybe we are as performant (or close to the performance of the competitors), but something fishy is going on.

I have not done any tests with the benchmarking tool in question, but here are some wild guesses:

Dumb question: did you compare the release builds of both executables?

UPD: Some differences I can see in the code, the echo server of uWebSockets uses some compression options, while tokio-tungstenite does not support it yet.

Note, I've noticed in your comment to the code that:

// Sent 115255 messages in 1 sec, throughput: 576275 bytes/sec

Whereas on the chart the throughput is much lower than that. Is there a typo in a comment or rather in a chart?

ry commented 1 year ago

Yes both are release builds. I think that comment is out-of-date.

We're comparing single threaded Tokio because in Deno, where we use this, we use the single thread runtime. In general an apples-to-apples comparison to uWebSocket, it is appropriate to use just a single thread.

We'll investigate the compression options. Good find. Thank you.

daniel-abramov commented 1 year ago

It seems like Deno folks decided to write a new performance-focused WebSocket library, so I think this issue may be closed.

However, this question has been asked recently, so I've just included some notes on performance in the README with a link to the relevant comment which summarises the improvements that tungstenite must implement in order to get closer to fastwebsockets in terms of performance.