Open jefflunt opened 11 years ago
So, if max load/response time was somewhere around 1500 rpm, then we should go up by 1x for every 300rpm (every 5rps) that we go up.
So at 300rpm/5rps the server should change to 2x requests. At 600rpm/10rps the server should change to 3x requests, etc., maxing out at 1500rpm/25rps.
It's possible when the server is right on the edge of one of these bumps that requesting 1x more delay will result in the rps nearly instantly dropping back down 1x. So, we'll wind up with a see-saw kind of effect. It might be good to quickly increase the delay request, but slow decay it over time, so that the server requests more breathing room as quickly as it believes that it needs it, but to be slow to release that breathing room once again. This should help eliminate the possible see-saw effect.
I was using log2viz to do basic load testing for a 3 process, 1 dyno, unicorn-based setup on heroku. What I found was that we could get 1500 rpm (25 per second), and were still in the <100 ms mean response time, while the 95th percentile response time was between 500-600ms.
In order to address the occasional spikes in traffic (theoretically caused by a large number of collaborators on a thought page), the server should self monitor the number of requests that it's getting per second, such that if it crosses over a certain load threshold, that the server will request the page refreshes come less frequently, on the scale of 1.0-5.0x.
With this scheme, it would mean that, at highest load, the server would request updates no more frequently than once every 5 seconds, and no less frequently than once every 5 minutes (as the current spread is 1 second, 1 minute).
The current setup can server 100 concurrent collaborators satisfactorily (with a 95th percentile of 500-600ms). By spreading things out to as much as 5x, we should, I think, be able to handle 500 concurrent users without too much trouble.