hypothesis / proxy-server

serve third party webpages (currently limited to pdfs) with the hypothesis client embedded and configured
1 stars 0 forks source link

Enable proxy caching in nginx #7

Open hmstepanek opened 4 years ago

hmstepanek commented 4 years ago

Problem

When proxying pages, the page can take ~4s to load in the browser for a simple pdf. One would expect subsequent proxy requests to load much faster as the response should be cached at that point in nginx and Cloudflare however, when testing I found this was not the case. In investigating caching within nginx I found that the proxy cache is disabled by default. The proxy cache caches responses made to the pages that via2 is proxing (the HEAD request response for the Content-Type header and the GET request response for the third party page content). Caching these proxy requests has several advantages:

  1. It makes subsequent requests to proxy the same url faster because the responses from the third party page are already cached in nginx.
  2. Because the responses are already cached in nginx, the load on nginx is less which means it can handle more requests at once.
  3. When a pdf is proxied in via a HEAD/GET request is issued to the third party page in the initial proxy request to determine the content type and an html page is returned. This html page then issues a second request made from the browser to that third party page (through the proxy server) for the pdf content. The proxy cache would already have cached the third party response from the initial HEAD/GET request for the content type which would mean the second request for the pdf content would hit the cache instead of sending a second request to the third party server.

Performance improvement

When I turned this on locally I found it decreased the response times from 200ms to 2ms. This means each request that is sent from the browser to the via2 server would be ~198ms faster (assuming that this request has been issued before).

Latencies without proxy cache: Latencies [mean, 50, 95, 99, max] 222.442901ms, 216.009804ms, 286.948443ms, 309.982766ms, 317.502399ms

Latencies w/ proxy cache: Latencies [mean, 50, 95, 99, max] 3.210505ms, 3.078673ms, 4.993552ms, 6.066541ms, 6.224982ms

Caveats

Steps forward

Is it worth enabling this even with the caveats mentioned? IMO yes since it's easy to do and worst case you don't see any performance improvement and best case you see a significant performance improvement.