Closed bruceadams closed 11 years ago
This: https://devcenter.heroku.com/articles/fork-app seems like the way to go to get a real testbed. We can fork, test for an hour, then destroy the fork to keep costs low.
@trinary nice! I'm very much a newbie to Heroku. Any idea what the name of my "sourceapp" is for "heroku fork ..."?
OK. I figured out the name, "gittip"! What a surprise... I cannot fork it:
$ heroku fork -a gittip gittip-loadtest
! You do not have access to the app gittip.
We'll need @whit537
@clone1018 Right. I would've been stunned if I could clone gittip, including all the data. Getting to the error message meant I had a reasonably correct command line.
What are we thinking as far as a stress test suite? I've used siege and apachebench (ab).
Do we know what kind of concurrency we were (not) dealing with today? Simulating thousands of concurrent browsers ends up being a challenge for a single machine and you can end up bottlenecked somewhere other than the app rather quickly (bandwidth, source ports, etc)
@trinary I was thinking of starting out with ab
. I'm happy to use siege
instead.
My understanding is that our production collapse issues, such as #1151 are due to hits to the front page. Several people are making suggestions about how to make the front page perform better under load. I want measurements that we can use to describe what we can handle now, and repeatable (ideally simple) tests so we can make comparative measurements of improvements.
Since Gittip has been crushed a few times, I have been assuming I could crush it with ab
running on a single host, especially if that host is located in AWS EC2 us-east (which, I think, is physically near to Gittip's servers).
I'll figure out how to publish my initial load test results soon (possibly in a couple of hours after I get home).
I used Apache Bench https://httpd.apache.org/docs/2.2/programs/ab.html
Other load test tools that might be interesting and/or relevant. Most of these can coordinate multiple load machines.
ab
, including just one load machine.This isn't exactly ideal, but the output from my several runs of ab
are available here: http://bruceadams.us/gittip/
I started running ab
but it's waaaaay slower than I expected--a verification of @bruceadams' results, as far as it goes. :-) Getting going on a VPS was an order of magnitude more work for me, so I didn't proceed. I deleted the gittip-load-test
app for now.
You know I'm not sure if there's any point in me running the rest of these, purely based on the first one, with the exact same results as you @bruceadams
Yours: http://bruceadams.us/gittip/ab-1000x10-2.txt Mine: https://gist.github.com/clone1018/6045107
Is there a python profiler we can run on the application?
Same test on a static page https://gist.github.com/clone1018/6045138
@clone1018 nice work, thanks!
Running a profiler on the application is certainly the next step. I don't know what is available in Python land. I'm sure there is something.
Let's work on this so @eric-s-raymond can send us some traffic. :-)
I have not looked into it yet but my past experience suggests the problem is not in the python code but in querying the db. IFF only the homepage is the problem, I'd suggest pregenerating the html for the anonymous user and serve that as a plain file (no db queries, no python). We already have a thread to prepare some tables so it can be done there.
After that a log of queries with the time they took would send us a looong way towards scaling the site. I think I've seen an issue for that but cannot find it right now :(
I have not looked into it yet but my past experience suggests the problem is not in the python code but in querying the db.
This is my expectation as well. We had plenty of traffic spikes in the first year but it wasn't until we had data in the db that we crashed. So far the homepage is the big problem, yes. One thing I realized is that #1413 could probably help with those queries. We can also think about static for anon, yes.
After that a log of queries with the time they took would send us a looong way towards scaling the site. I think I've seen an issue for that but cannot find it right now :(
What's the deliverable on this ticket? A one-time load test? Or an ongoing load-test? Or ... ?
@bruceadams Do you still have the regular load-test going? If we publish that spreadsheet, would that satisfy this ticket?
@bruceadams Can we get an update on this ticket? What's the deliverable here?
I'm not speaking for Bruce but the idea of this ticket was to get some data about how Gittip preforms on Heroku and compare it to other places or config changes. I think it can be closed.
Sorry to be slow responding here. My goals with this ticket were:
I think we came fairly close to achieving each of these, so I'm closing the ticket.
There is the larger question of a repeatable load test. As we change things, especially core infrastructure things or core implementation, how does our response time and break point change? Also, if we see a big request flood coming our way, say due to some big publicity, are we ready? A load test should let us answer that question, with a fair amount of confidence, in advance.
There is the larger question of a repeatable load test. As we change things, especially core infrastructure things or core implementation, how does our response time and break point change? Also, if we see a big request flood coming our way, say due to some big publicity, are we ready? A load test should let us answer that question, with a fair amount of confidence, in advance.
That would be great to have.
After #1151 and some earlier events, we need to get in front of this.
Step one: A simple load test of not-logged-in fetch of the front page.
I'm very willing and mostly able to do this. My problem here is how do I get a realistic deployment of Gittip? I don't understand the details of how Gittip is deployed on Heroku.