seanmonstar / warp

A super-easy, composable, web server framework for warp speeds.
https://seanmonstar.com/post/176530511587/warp
MIT License
9.59k stars 723 forks source link

Optimize file reads in `fs` module #1071

Open joseluisq opened 1 year ago

joseluisq commented 1 year ago

This PR improves the performance of fs module when reading files in particular for the file stream implementation.

By refactoring the fs file stream and preferring std::fs::File instead of its tokio async variant, the fs module performance improves significantly when doing file io reads resulting in ~79% more req/sec using ~57% less memory in Linux (according to my tests). However, a similar result should also be expected in Unix targets.

Results

Here is a generic benchmark in Linux just to illustrate.

The examples/file.rs was used in the tests.

# BEFORE: (~10.5MiB RAM used)

Running 20s test @ http://127.0.0.1:3030
  4 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.93ms    1.26ms  72.48ms   83.25%
    Req/Sec    13.33k     1.38k   41.39k    89.39%
  Latency Distribution
     50%    1.79ms
     75%    2.44ms
     90%    3.16ms
     99%    4.88ms
  1062701 requests in 20.10s, 2.04GB read
Requests/sec:  52871.36
Transfer/sec:    103.92MB

# AFTER: (~4,5MiB RAM used)

wrk --latency -c100 -t4 -d20s http://127.0.0.1:3030
Running 20s test @ http://127.0.0.1:3030
  4 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   357.46us  174.88us   4.51ms   79.99%
    Req/Sec    64.42k     2.96k   69.96k    81.75%
  Latency Distribution
     50%  314.00us
     75%  415.00us
     90%  571.00us
     99%    1.00ms
  5126993 requests in 20.04s, 9.84GB read
Requests/sec: 255878.14
Transfer/sec:    502.93MB

Basically, ~79.33% more requests per second utilizing ~57.14% less memory.

Context

Below I highlight a quote from the Tokio website talking about Linux io_uring support which I think is self-explanatory.

All tokio-uring operations are truly async, unlike APIs provided by tokio::fs, which run on a thread pool. Using synchronous filesystem operations from a thread pool adds significant overhead. With io-uring, we can perform both network and file system operations asynchronously from the same thread. But, io-uring is a lot more. – https://tokio.rs/blog/2021-07-tokio-uring

Credits

Not all are mine, there are other people involved in making this possible like the https://github.com/weihanglo/sfz project (inspiration) as well as contributors to SWS.

--

We're enjoying this optimization in SWS so that's why I share it with the warp folks too 😅.

Let me know what are your thoughts on this.