rstudio / plumber

Turn your R code into a web API.
https://www.rplumber.io
Other
1.38k stars 256 forks source link

Improve performance of Plumber in TechEmpower's API benchmarking #863

Open emilmahler opened 2 years ago

emilmahler commented 2 years ago

TechEmpower organise the most widely used API performance benchmarking, typically once per year: https://www.techempower.com/benchmarks/

We submitted a PR (which is a Docker image with six Plumber endpoints), which was just accepted. This will allow us to compare Plumber performance with frameworks in other programming languages.

I don't know if this is the right place to post it (apologies if it's in the wrong place), but if experienced Plumber developers would like to improve the performance of our submission, now is the right time to do it, as TechEmpower will be running Round 21, the first in a year, in the coming week.

schloerke commented 2 years ago

I believe [you] should also do an httpuv submission. Similar to https://github.com/the-benchmarker/web-frameworks where plumber and httpuv were submitted. The {httpuv} submission would represent the upper bound of how fast {plumber} can go.

As for the {plumber} code, it looks like it is more benchmarking data base queries than handling http requests. Do the random number values have to be requested at the request time, or can the 10k database rows be cached into memory?

Do you have a link to the route requirements, I could not find them 😞 . Those requirements will really shape how code can be improved.

Similar to https://github.com/the-benchmarker/web-frameworks, performance will not be "top tier" in the benchmark test. But we also have to remember R has a much larger statistical skill set that can be returned in a minimal amount of code.

emilmahler commented 2 years ago

I wasn't aware of https://github.com/the-benchmarker/web-frameworks, thanks for the link. With TechEmpower, you aren't allowed to cache results - they have to be processed one at a time. The focus is as much on how quickly the ORM/database handler can get postgres queries as it is deliver the json result through an endpoint.

One of our motivations behind this was to understand the limitations of Plumber, as performance is impacting us in our production environment. Hopefully there's some low hanging fruit behind Plumber and httpuv that means performance can be brought up to some of the slowest Python frameworks like Django in the coming years.

schloerke commented 2 years ago

Do you have a link to the route requirements? I could not find them. Those requirements will really shape how code can be improved.

The script should maybe use the {pool} package to not close the connections to the DB. It also looks like the queries are fairly fast, so using {future} would not help.


One of our motivations behind this was to understand the limitations of Plumber, as performance is impacting us in our production environment.

Can you explain what is happening?


Hopefully there's some low hanging fruit behind Plumber and httpuv that means performance

Removing the default filters gives a big improvement in speed. But no better than 4x faster.