Improve throughput when vectored IO is not enabled

xiaoyawei commented 10 months ago

As discussed in https://github.com/hyperium/h2/issues/711, the current implementation of sending data is suboptimal when vectored I/O is not enabled: data frame's head is likely to be sent in a separate TCP segment, whose payload is of only 9 bytes.

This PR adds some specialized implementaton for non-vectored I/O case. In short, it sets a larget chain threhold, and also makes sure a data frame's head is sent along with the beginning part of the real data payload.

All existing unit tests passed. Also I take a look at the e2e https://github.com/hyperium/hyper/blob/0.14.x/benches/end_to_end.rs but realize that all the benchmarks there are for the case of vectored I/O if the OS supports vectored I/O. There isn't a specific case for non-vectored I/O so I am not sure how to proceed with benchmark for performance evaluations. @seanmonstar Would appreciate if there's some advice on how to do benchmarks for this change.

This PR also includes a trivial bug fix, and addressing a straightforward TODO comment

seanmonstar commented 10 months ago

There isn't a specific case for non-vectored I/O so I am not sure how to proceed with benchmark for performance evaluations.

For comparison purposes, you could write a struct NonVecIo<T>(T), and implement AsyncRead and AsyncWrite for it, forwarding each method on, except for is_write_vectored(). You could do that in the end_to_end.rs file, and compare before and after. I don't think we need commit that change, though...

xiaoyawei commented 9 months ago

@seanmonstar I forked hyper and put together a commit https://github.com/xiaoyawei/hyper/commit/6f3c37939236899a23e8d4f694d00cc16b979f2c to benchmark vectored io. In my test setup, parallelism is 10, both request and response payload is of 500 bytes. I run the benchmark with h2 0.3.9 and this fix 3 times, and the result is as follows

`0.3.9`

242576 ns/iter (+/- 32535)
257091 ns/iter (+/- 82970)
258892 ns/iter (+/- 216218)

`w/ this fix`

184308 ns/iter (+/- 57584)
181720 ns/iter (+/- 51599)
179991 ns/iter (+/- 38387)

Looks like the perf improvement is pretty solid

seanmonstar commented 9 months ago

Nice! Thanks for putting that together.

@nox or @Noah-Kennedy, does this seem good in your cases too?

Noah-Kennedy commented 9 months ago

I'll take a look today. Not sure if @nox has opinions or not.

xiaoyawei commented 9 months ago

@seanmonstar @Noah-Kennedy

I removed the gitignore rules; let me know if there are other feedback, thanks ;)

seanmonstar commented 9 months ago

Excellent work, thank you for the benchmark results, they certainly help in feeling confident with the change.

hyperium / h2

Improve throughput when vectored IO is not enabled #712

`0.3.9`

`w/ this fix`