async-std support with trait objects

yu-re-ka commented 2 years ago

Alternative to #1345

yu-re-ka commented 2 years ago

Surprisingly using the async-std/async-io based UdpSocket and Timer code (still running under the tokio runtime otherwise) leads to better performance in the bulk bench, even though it spawns a background thread.

bench results

``` $ cargo run --bin bulk --release Client 0 stats: Overall download stats: Transferred 1073741824 bytes on 1 streams in 1.86s (549.73 MiB/s) Stream download metrics: │ Throughput │ Duration ──────┼───────────────┼────────── AVG │ 549.75 MiB/s │ 1.86s P0 │ 549.50 MiB/s │ 1.86s P10 │ 550.00 MiB/s │ 1.86s P50 │ 550.00 MiB/s │ 1.86s P90 │ 550.00 MiB/s │ 1.86s P100 │ 550.00 MiB/s │ 1.86s $ sed "s/TokioRuntime/AsyncStdRuntime/g" -i bench/src/lib.rs $ cargo run --bin bulk --release --features runtime-async-std [...] Client 0 stats: Overall download stats: Transferred 1073741824 bytes on 1 streams in 1.70s (602.03 MiB/s) Stream download metrics: │ Throughput │ Duration ──────┼───────────────┼────────── AVG │ 602.25 MiB/s │ 1.70s P0 │ 602.00 MiB/s │ 1.70s P10 │ 602.50 MiB/s │ 1.70s P50 │ 602.50 MiB/s │ 1.70s P90 │ 602.50 MiB/s │ 1.70s P100 │ 602.50 MiB/s │ 1.70s $ ```

We could also try to choose the correct runtime for the user:

pub fn runtime() -> Box<dyn Runtime> {
    #[cfg(feature = "runtime-tokio")]
    {
        if let Ok(_) = tokio::runtime::Handle::try_current() {
            return TokioRuntime;
        }
    }

    #[cfg(feature = "runtime-async-std")]
    {
        return AsyncStdRuntime;
    }

    panic!("No usable runtime found");
}

But I don't know, should the user still be able to override this?

Ralith commented 2 years ago

Surprisingly using the async-std/async-io based UdpSocket and Timer code (still running under the tokio runtime otherwise) leads to better performance in the bulk bench, even though it spawns a background thread.

Interesting! How consistent is that across runs?

We could also try to choose the correct runtime for the user:

I actually really like the look of that helper, though I'm not sure it should be exposed directly. Let's use that to avoid complicating the Endpoint::client and Endpoint::server helpers (the entire purpose of those is to be simple), but take an explicit argument in Endpoint::new for when explicit selection is required.

yu-re-ka commented 2 years ago

Interesting! How consistent is that across runs?

After trying a few more things I think it comes down to the amount of threads the load is distributed across. When using the single-threaded tokio executor (like the bench is), it is bottle-necked on that single thread, and off-loading thread to another thread takes load off the executor thread. When using a multi-threaded runtime, a few threads (4-6 threads on my machine) do increase performance because it allows the load to be handled by multiple cores, but increasing the thread count even further will result in slower performance because of lots of moving tasks around.

Let's use that to avoid complicating the Endpoint::client and Endpoint::server helpers (the entire purpose of those is to be simple), but take an explicit argument in Endpoint::new for when explicit selection is required.

Sounds good! I will implement that later.

yu-re-ka commented 2 years ago

(Close was accidental. Pressed the wrong button)

Ralith commented 2 years ago

After trying a few more things I think it comes down to the amount of threads the load is distributed across.

Interesting indeed; I wouldn't have guessed that there was any opportunity for parallelism there. Good investigation!

parazyd commented 2 years ago

@yu-re-ka This still seems to have a hard dependency on the tokio crate. Is that intended?

yu-re-ka commented 2 years ago

Yes, that was the suggestion from the maintainers:

I also don't think tokio's sync primitives are runtime-specific, so those can be left in on the first pass to simplify review.

I tested that it only builds the sync feature of the tokio crate, so you don't get a dependency on any of the runtime/executor code.

yu-re-ka commented 2 years ago

I have removed the runtime parameter from Endpoint::client, Endpoint::server (these now use the default mechanism described earlier), and Endpoint::connect, Endpoint::connect_with (these use the runtime the endpoint was created with).

parazyd commented 2 years ago

Brilliant work

yu-re-ka commented 2 years ago

Rebased, resolved conflicts

yu-re-ka commented 2 years ago

I can see from your comments that you would like things to be changed somehow, but there are so many options for how I could interpret these comments. I now just chose any one option, so you will have to make new comments if my interpretations did not match your comment intention - or send me a link to some guideline I can follow.

dignifiedquire commented 2 years ago

why was this closed?

yu-re-ka commented 2 years ago

I don't have motivation to refactor this again. Anyone feel free to open a new PR and use these changes if it is helpful.

Ralith commented 2 years ago

This was very close to completion, so I took it the last few steps in https://github.com/quinn-rs/quinn/pull/1364.

quinn-rs / quinn

async-std support with trait objects #1346