Throughput doesn't increase with cores/threads count

trungda commented 1 year ago

Bug Report

Hi, we are using tonic/hyper running on a multithreaded tokio runtime to serve GRPC requests. The requests are very simple, the server gets a request and returns "HELLO" immediately. When we benchmark number of cores/tokio worker threads (we try to keep the number of tokio worker threads the same with the number of cpu cores available), we realized that throughput stops increasing after core count grows higher than 8 whilst CPU stays very low (10% for a 32-core instance).

Btw, we are running the server inside docker.

I am wondering if there is a bottleneck somewhere or is there something wrong with our setup? Thank you :)

Version

0.8

Platform

Ubuntu

Crates

https://github.com/hyperium/tonic

Description

Our code is quite simple:

fn async server() {
    Server::builder()
        .tcp_keepalive(Some(Duration::from_secs(10)))
        .http2_keepalive_interval(Some(Duration::from_secs(10)))
        .timeout(Duration::from_secs(6))
        .add_service(service.clone())
        .serve_with_shutdown(
            9000,
            signals::shutdown_signals_future()?,
        ).await;
}

fn main() {
    Builder::new_multi_thread()
        .enable_all()
        .worker_threads(8) // this is the variable being tuned
        .build()
        .unwrap()
        .block_on(server())
}

TroyKomodo commented 1 year ago

Are you using musl or glibc?

trungda commented 1 year ago

Are you using musl or gcc?

I think we are using glibc.

Diggsey commented 1 year ago

Have you been able to achieve higher throughput with a different web server in the same docker environment?

If no, then it sounds like your bottleneck is elsewhere, and unrelated to hyper/tonic/tokio.

trungda commented 1 year ago

Confirmed that it's due to single event loop in the tokio runtime. Switching to multiple single threaded runtime drastically improved our performance.

mouseless-eth commented 1 year ago

Confirmed that it's due to single event loop in the tokio runtime. Switching to multiple single threaded runtime drastically improved our performance.

If possible, could you share a snippet of the code that you implemented? I implemented the same method as in the article, but some of my requests take up to 40ms to complete whilst the rest only take ~4ms to complete :/

trungda commented 1 year ago

How many client connections do you have? The method mentioned in the article is a common way to solve many connections problem. At the same time, it can help with many RPS too. If the number of client connections is less than the number of cores, it's not gonna help much since some cores will be idle. I can achieve pretty much linear growth when removing all cross-thread locks.

zuston commented 10 months ago

Confirmed that it's due to single event loop in the tokio runtime. Switching to multiple single threaded runtime drastically improved our performance.

What do you mean? Could you help describe more, I also suffering from this/

trungda commented 10 months ago

So pretty much I follow the suggestion from this article since I know that my workload is uniformly distributed.

hyperium / tonic