rstudio / httpuv

HTTP and WebSocket server package for R
Other
227 stars 86 forks source link

Consider using RCPP_USE_UNWIND_PROTECT for improved performance #244

Open atheriel opened 5 years ago

atheriel commented 5 years ago

I was doing some casual profiling of some httpuv servers and noticed that the call stack was deeper than I expected. You can see in the flame graph below (for the "hello, world" example) that there is a tryCatch/eval stack between later::execCallbacks() and the actual call() method (anonymous here) for the server:

httpuv-before

I eventually worked out that this tryCatch/eval setup is manually created in C++ code when evaluating an Rcpp::Function, in order to prevent longjmps. This might be well-known. However, it is also apparent from the flame graph that this particular stack seems to impose a significant amount of overhead on all requests, and I wondered if removing it could improve performance.

As it turns out, Rcpp has a not-very-well-advertised configuration option to avoid this in favour of newer C-level features of R, RCPP_USE_UNWIND_PROTECT (originally implemented by Lionel Henry, I believe). This flag is used by some existing packages, dplyr among them, so it probably has few downsides.

To illustrate the potential difference, I ran some load testing for the "hello, world" server against vanilla master:

$ wrk -d 20 -c 10 -t 2 "http://127.0.0.1:5000/"
Running 20s test @ http://127.0.0.1:5000/
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.55ms    1.57ms  41.95ms   96.25%
    Req/Sec     2.03k   139.92     2.39k    75.25%
  80786 requests in 20.01s, 7.70MB read
Requests/sec:   4038.18
Transfer/sec:    394.35KB

And with PKG_CPPFLAGS += -DRCPP_USE_UNWIND_PROTECT:

$ wrk -d 20 -c 10 -t 2 "http://127.0.0.1:5000/"
Running 20s test @ http://127.0.0.1:5000/
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.76ms    1.36ms  39.96ms   97.78%
    Req/Sec     2.98k   288.29     6.39k    77.56%
  118950 requests in 20.10s, 11.34MB read
Requests/sec:   5917.94
Transfer/sec:    577.92KB

As you can see, throughput is about 50% higher. The accompanying flame graph, showing the simplified stack, is below:

httpuv-after

In light of this, I believe that you should consider building httpuv with RCPP_USE_UNWIND_PROTECT by default, if possible.

wch commented 5 years ago

Sounds promising, but I think we'd really need to be careful that errors, interrupts, and C++ exceptions are handled correctly.