gratipay / gratipay.com

Here lieth a pioneer in open source sustainability. RIP
https://gratipay.news/the-end-cbfba8f50981
MIT License
1.12k stars 308 forks source link

Load test #1152

Closed bruceadams closed 11 years ago

bruceadams commented 11 years ago

After #1151 and some earlier events, we need to get in front of this.

Step one: A simple load test of not-logged-in fetch of the front page.

I'm very willing and mostly able to do this. My problem here is how do I get a realistic deployment of Gittip? I don't understand the details of how Gittip is deployed on Heroku.

trinary commented 11 years ago

This: https://devcenter.heroku.com/articles/fork-app seems like the way to go to get a real testbed. We can fork, test for an hour, then destroy the fork to keep costs low.

bruceadams commented 11 years ago

@trinary nice! I'm very much a newbie to Heroku. Any idea what the name of my "sourceapp" is for "heroku fork ..."?

bruceadams commented 11 years ago

OK. I figured out the name, "gittip"! What a surprise... I cannot fork it:

$ heroku fork -a gittip gittip-loadtest
 !    You do not have access to the app gittip.
clone1018 commented 11 years ago

We'll need @whit537

bruceadams commented 11 years ago

@clone1018 Right. I would've been stunned if I could clone gittip, including all the data. Getting to the error message meant I had a reasonably correct command line.

trinary commented 11 years ago

What are we thinking as far as a stress test suite? I've used siege and apachebench (ab).

Do we know what kind of concurrency we were (not) dealing with today? Simulating thousands of concurrent browsers ends up being a challenge for a single machine and you can end up bottlenecked somewhere other than the app rather quickly (bandwidth, source ports, etc)

bruceadams commented 11 years ago

@trinary I was thinking of starting out with ab. I'm happy to use siege instead.

My understanding is that our production collapse issues, such as #1151 are due to hits to the front page. Several people are making suggestions about how to make the front page perform better under load. I want measurements that we can use to describe what we can handle now, and repeatable (ideally simple) tests so we can make comparative measurements of improvements.

Since Gittip has been crushed a few times, I have been assuming I could crush it with ab running on a single host, especially if that host is located in AWS EC2 us-east (which, I think, is physically near to Gittip's servers).

bruceadams commented 11 years ago

I'll figure out how to publish my initial load test results soon (possibly in a couple of hours after I get home).

I used Apache Bench https://httpd.apache.org/docs/2.2/programs/ab.html

Other load test tools that might be interesting and/or relevant. Most of these can coordinate multiple load machines.

bruceadams commented 11 years ago

This isn't exactly ideal, but the output from my several runs of ab are available here: http://bruceadams.us/gittip/

chadwhitacre commented 11 years ago

I started running ab but it's waaaaay slower than I expected--a verification of @bruceadams' results, as far as it goes. :-) Getting going on a VPS was an order of magnitude more work for me, so I didn't proceed. I deleted the gittip-load-test app for now.

clone1018 commented 11 years ago

You know I'm not sure if there's any point in me running the rest of these, purely based on the first one, with the exact same results as you @bruceadams

Yours: http://bruceadams.us/gittip/ab-1000x10-2.txt Mine: https://gist.github.com/clone1018/6045107

clone1018 commented 11 years ago

Is there a python profiler we can run on the application?

clone1018 commented 11 years ago

Same test on a static page https://gist.github.com/clone1018/6045138

bruceadams commented 11 years ago

@clone1018 nice work, thanks!

Running a profiler on the application is certainly the next step. I don't know what is available in Python land. I'm sure there is something.

chadwhitacre commented 11 years ago

Let's work on this so @eric-s-raymond can send us some traffic. :-)

zbynekwinkler commented 11 years ago

I have not looked into it yet but my past experience suggests the problem is not in the python code but in querying the db. IFF only the homepage is the problem, I'd suggest pregenerating the html for the anonymous user and serve that as a plain file (no db queries, no python). We already have a thread to prepare some tables so it can be done there.

After that a log of queries with the time they took would send us a looong way towards scaling the site. I think I've seen an issue for that but cannot find it right now :(

chadwhitacre commented 11 years ago

I have not looked into it yet but my past experience suggests the problem is not in the python code but in querying the db.

This is my expectation as well. We had plenty of traffic spikes in the first year but it wasn't until we had data in the db that we crashed. So far the homepage is the big problem, yes. One thing I realized is that #1413 could probably help with those queries. We can also think about static for anon, yes.

After that a log of queries with the time they took would send us a looong way towards scaling the site. I think I've seen an issue for that but cannot find it right now :(

42?

chadwhitacre commented 11 years ago

What's the deliverable on this ticket? A one-time load test? Or an ongoing load-test? Or ... ?

@bruceadams Do you still have the regular load-test going? If we publish that spreadsheet, would that satisfy this ticket?

chadwhitacre commented 11 years ago

@bruceadams Can we get an update on this ticket? What's the deliverable here?

clone1018 commented 11 years ago

I'm not speaking for Bruce but the idea of this ticket was to get some data about how Gittip preforms on Heroku and compare it to other places or config changes. I think it can be closed.

bruceadams commented 11 years ago

Sorry to be slow responding here. My goals with this ticket were:

  1. setup and run a realistic and simple load test -- We did this with the Heroku clone and Apache Bench tool
  2. Establish a couple of benchmarks: a. What is our response time under moderately high load b. What is our breaking point (how many users can we support)

I think we came fairly close to achieving each of these, so I'm closing the ticket.

There is the larger question of a repeatable load test. As we change things, especially core infrastructure things or core implementation, how does our response time and break point change? Also, if we see a big request flood coming our way, say due to some big publicity, are we ready? A load test should let us answer that question, with a fair amount of confidence, in advance.

zbynekwinkler commented 11 years ago

There is the larger question of a repeatable load test. As we change things, especially core infrastructure things or core implementation, how does our response time and break point change? Also, if we see a big request flood coming our way, say due to some big publicity, are we ready? A load test should let us answer that question, with a fair amount of confidence, in advance.

That would be great to have.