Closed zbynekwinkler closed 11 years ago
I see essentially the same results. The maximum transfer speed I'm seeing is about 6 Kbytes/sec, I get about that for any thread count (ab -c#) from 2 up through 20. With just a single ab thread count, I get 5 Kbytes/sec.
The best thing we can do right now is consolidate css and js files and put all static assets on CDN. That would bring the request count per page to 1. That alone would be eightfold increase for users with a cold cache (a storm of new visitors we are expecting on Friday after Chads interview is out).
I guess what I would really like is to get performance like this (it is a fairly high profile site in CZ, like czech google):
C:\home\zbynek\www.gittip.com>ab.exe -c20 -n100 http://www.seznam.cz/st/3.33.21/css/medium.css?3.33.21
This is ApacheBench, Version 2.3 <$Revision: 1430300 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking www.seznam.cz (be patient).....done
Server Software: nginx/1.4.2
Server Hostname: www.seznam.cz
Server Port: 80
Document Path: /st/3.33.21/css/medium.css?3.33.21
Document Length: 4167 bytes
Concurrency Level: 20
Time taken for tests: 0.773 seconds
Complete requests: 100
Failed requests: 0
Write errors: 0
Total transferred: 450000 bytes
HTML transferred: 416700 bytes
Requests per second: 129.35 [#/sec] (mean)
Time per request: 154.616 [ms] (mean)
Time per request: 7.731 [ms] (mean, across all concurrent requests)
Transfer rate: 568.45 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 2 7 7.8 4 37
Processing: 13 98 53.2 85 347
Waiting: 5 62 60.0 50 347
Total: 22 105 59.7 89 379
Percentage of the requests served within a certain time (ms)
50% 89
66% 97
75% 104
80% 110
90% 166
95% 268
98% 354
99% 379
100% 379 (longest request)
but even something like this would be ok :).
C:\home\zbynek\www.gittip.com>ab.exe -c20 -n100 http://assets-gittipllc.netdna-ssl.com/10.1.36/hero-bg.png
This is ApacheBench, Version 2.3 <$Revision: 1430300 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking assets-gittipllc.netdna-ssl.com (be patient).....done
Server Software: NetDNA-cache/2.2
Server Hostname: assets-gittipllc.netdna-ssl.com
Server Port: 80
Document Path: /10.1.36/hero-bg.png
Document Length: 9746 bytes
Concurrency Level: 20
Time taken for tests: 4.849 seconds
Complete requests: 100
Failed requests: 0
Write errors: 0
Total transferred: 1002800 bytes
HTML transferred: 974600 bytes
Requests per second: 20.62 [#/sec] (mean)
Time per request: 969.897 [ms] (mean)
Time per request: 48.495 [ms] (mean, across all concurrent requests)
Transfer rate: 201.94 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 39 47 11.7 41 87
Processing: 69 786 203.6 781 1187
Waiting: 65 458 258.1 448 1187
Total: 130 834 203.4 822 1252
Percentage of the requests served within a certain time (ms)
50% 822
66% 840
75% 898
80% 956
90% 1102
95% 1188
98% 1235
99% 1252
100% 1252 (longest request)
This issue appears to be fundamental to Aspen. Running gittip locally (which is difficult far too often, see #1619) I get worse throughput with more requesting threads.
$ ab -n100 -c1 http://127.0.0.1:8537/assets/reset.css | grep '(mean)'
Requests per second: 14.73 [#/sec] (mean)
Time per request: 67.910 [ms] (mean)
$ ab -n100 -c2 http://127.0.0.1:8537/assets/reset.css | grep '(mean)'
Requests per second: 16.75 [#/sec] (mean)
Time per request: 119.411 [ms] (mean)
$ ab -n100 -c4 http://127.0.0.1:8537/assets/reset.css | grep '(mean)'
Requests per second: 5.83 [#/sec] (mean)
Time per request: 686.101 [ms] (mean)
$ ab -n100 -c8 http://127.0.0.1:8537/assets/reset.css | grep '(mean)'
Requests per second: 4.86 [#/sec] (mean)
Time per request: 1646.592 [ms] (mean)
The CPU on my 4-core laptop never went anywhere near to 100%. My theory is that Aspen is, in effect, single threaded.
This behavior is much like Rails not too long ago. Heroku has made a lot money because they allow one-request-at-a-time Rails apps to scale up easily with many dynos.
I assume the quickest (and not cheap) fix is to add more Gittip Aspen servers. This requires some code fixes, to move user session handling (for Twitter and Bitbucket) out of memory.
What is your value of MIN_THREADS in local.env? In production we are now running with 40.
Aspen is using under the hood the http server from cherrypy (cheroot). I've never checked it any closer so I don't know anything about it. When I get home I was planning to try some simple wsgi hello world app under different wsgi containers/servers (like gunicorn or uwsgi) locally to see what difference does the server make.
I hadn't even thought about MIN_THREADS. Looking now, it's set to the default of MIN_THREADS=10.
I've always been saying aspen is a little slow, I wanted to do some tests like @zwn will be doing but I never found the time. Good luck, let us know what happens!
Testing a single static page on a vanilla Aspen server gives much better performance. It is fast enough that I'm not sure I can tell how well it is processes requests in parallel. (These tests were run on my 4-core i5 laptop.)
ba:~$ ab -n1000 -c1 http://127.0.0.1:8080/ 2>/dev/null | grep '(mean)'
Requests per second: 793.02 [#/sec] (mean)
Time per request: 1.261 [ms] (mean)
ba:~$ ab -n1000 -c2 http://127.0.0.1:8080/ 2>/dev/null | grep '(mean)'
Requests per second: 441.07 [#/sec] (mean)
Time per request: 4.534 [ms] (mean)
ba:~$ ab -n1000 -c4 http://127.0.0.1:8080/ 2>/dev/null | grep '(mean)'
Requests per second: 383.14 [#/sec] (mean)
Time per request: 10.440 [ms] (mean)
ba:~$ ab -n1000 -c8 http://127.0.0.1:8080/ 2>/dev/null | grep '(mean)'
Requests per second: 447.95 [#/sec] (mean)
Time per request: 17.859 [ms] (mean)
ba:~$ ab -n1000 -c16 http://127.0.0.1:8080/ 2>/dev/null | grep '(mean)'
Requests per second: 490.06 [#/sec] (mean)
Time per request: 32.649 [ms] (mean)
@bruceadams was your test using postgression for your local Gittip?
@clone1018 yes, I used Postgression. I don't expect fetching a static page to exercise the database. I could capture database network traffic to be sure.
Since we serve the static page via aspen as a webserver, is there a chance we're opening a database connection?
We should also look at the effect of logging on performance. I have a tingly sense that lots of stdout chatter could slow us down.
Logging has the potential to have a huge impact on performance. The best logging frameworks jump through all kinds of hoops to avoid synchronization, excessive object creation and other stuff.
Since we serve the static page via aspen as a webserver, is there a chance we're opening a database connection?
If we are it'd be in a hook, and the places to check would be in configure-aspen.py
and gittip/wireup.py
.
@bruceadams I am not getting the same results as you.
$ ab -c1 -n100 http://localhost:8537/assets/reset.css | grep '(mean)'
Requests per second: 166.02 [#/sec] (mean)
Time per request: 6.024 [ms] (mean)
$ ab -c2 -n100 http://localhost:8537/assets/reset.css | grep '(mean)'
Requests per second: 117.18 [#/sec] (mean)
Time per request: 17.068 [ms] (mean)
$ ab -c4 -n100 http://localhost:8537/assets/reset.css | grep '(mean)'
Requests per second: 87.36 [#/sec] (mean)
Time per request: 45.790 [ms] (mean)
$ ab -c8 -n100 http://localhost:8537/assets/reset.css | grep '(mean)'
Requests per second: 78.20 [#/sec] (mean)
Time per request: 102.305 [ms] (mean)
$ ab -c16 -n100 http://localhost:8537/assets/reset.css | grep '(mean)'
Requests per second: 77.45 [#/sec] (mean)
Time per request: 206.598 [ms] (mean)
But the pattern is clear. The more threads the worse. I have also 4 core i5 laptop. I am able to get to 50% utilization in total.
@zwn interesting. Do you have a local database? If yes, your faster results suggest the database is being hit when serving this static asset. (I haven't had a chance to check.) (Also, I could not use "localhost" because my machine treats "localhost" as ip6 and something misbehaved.)
@bruceadams I have a local db. I am not sure what work we do with db per request (I have looked but we need some instrumentation to be sure).
Running gunicorn -w 10
with default hello world app from http://docs.gunicorn.org/en/latest/run.html :
$ ab -c1 -n100 http://localhost:8000/ | grep '(mean)'
Requests per second: 1291.87 [#/sec] (mean)
Time per request: 0.774 [ms] (mean)
$ ab -c10 -n100 http://localhost:8000/ | grep '(mean)'
Requests per second: 5396.07 [#/sec] (mean)
Time per request: 1.853 [ms] (mean)
$ ab -c100 -n100 http://localhost:8000/ | grep '(mean)'
Requests per second: 3472.34 [#/sec] (mean)
Time per request: 28.799 [ms] (mean)
That is a different ballpark.
It is possible to mount Aspen as a WSGI app on gunicorn:
Watching network traffic to the database, yes there is a round trip to the database for every fetch of assets/reset.css I'll see if I can figure out where that is in the code (by stumbling around in the dark).
@bruceadams It's probably in a hook somewhere. Actually, it's probably related to authentication.
I'm in the process of setting up a load testing environment locally. I have the FreeBSD machine I set up while my laptop was out of commission. I'm installing ab there to hit Gittip on my laptop over the LAN.
We removed the extra db hit per request in #1626. However, I'm still seeing ~3 req/sec for reset.css in production, as we were the other day.
For kicks, I stood up an instance of Gittip at Digital Ocean (smallest droplet, NY1). It's reading from the production database and is configured with MIN_THREADS=40
and DATABASE_MAXCONN=100
, as in production. I edited configure-aspen.py
by hand to turn off the call to utils.update_homepage_tables_once
.
In general, I'm seeing ~115 r/s for /assets/reset.css
and ~9 r/s for /
. From our Heroku dyno we're getting ~6 and ~3, respectively. In other words, we're faster from the DO droplet than we are from the Heroku dyno, even though both are reading from the same Heroku Postgres database.
Two possible ways to account for the discrepancy are SSL and a baseline load of ~2 req/sec, which are present on the Heroku dyno but not the DO droplet.
Testing locally I am seeing ~177 r/s for /assets/reset.css
and 19 r/s for /
. That's with a recent backup in a Postgres on a different machine on the same LAN.
I ran two ab
commands at once, one from the laptop and another from the FreeBSD machine. The FreeBSD machine is serving Gittip connected to Postgres on the laptop. I set up a slow -n300 -c1 trickle against /
on the laptop, then I fired off a -n100 -c20 against /assets/reset.css
from the FreeBSD machine (I get lots of failed requests in such circumstances, but some runs complete). I'm seeing 130 r/s for /assets/reset.css
even when the server is under load for /
, and meanwhile the /
run sees 11 r/s.
This result seems to indicate against the "baseline load" explanation for the slowness of reset.css at Heroku.
Using Gunicorn locally, I am seeing similar results as with Cheroot: ~170 r/s for /assets/reset.css
and ~20 r/s for /
.
Edit: I may very well not have been running Gunicorn with multiple processes, cf. https://github.com/gittip/www.gittip.com/issues/1617#issuecomment-27149658.
Which, I suppose, is all to say that, yes, something is really broken in our http serving.
:broken_heart:
BTW, here are the steps I followed to run Gittip with Gunicorn on port 8537:
[gittip] $ pip install gunicorn
local.env
and add ASPEN_WWW_ROOT=www
and ASPEN_PROJECT_ROOT=.
[gittip] $ foreman run -e local.env gunicorn aspen.wsgi:website -b:8537
In general I've been using -n300 -c20 as we did the other day.
I believe that this ticket is about the two order of magnitude difference in serving /assets/reset.css
between Heroku and the other environments we've looked at. The difference between /
at Heroku and elsewhere is not as pronounced. My hunch is the smaller difference has the same root cause as the larger difference, but in the dynamic case that cause is being swamped by time spent querying the database or other factors.
I've opened a support ticket with Heroku:
We're stuck trying to tune our application's performance. Could we get some eyeballs?
https://github.com/gittip/www.gittip.com/issues/1617
tl;dr In a couple other environments (local, vps), our app runs between 100 and 200 requests per second for a static page, and 10 and 20 requests per second for a dynamic page. On our Heroku dyno we're only seeing between 1 and 10 requests per second for both static and dynamic pages. Why are we off an order of magnitude for dynamic pages and two orders of magnitude for static pages?
P.S. Could we cc: bruce.adams@acm.org and zbynek.winkler@gmail.com on this ticket, please? Thanks! :-)
Which, I suppose, is all to say that, yes, something is really broken in our http serving.
:broken_heart:
I was kind of afraid of that. BTW: !m @whit537. I am excited to see you were continuing to debug this even after I went to bed yesterday :smile:
I believe that this ticket is about the order of magnitude difference in serving /assets/reset.css between Heroku the other environments we've looked at.
I agree. That is what has prompted me to open this ticket in the first place. Serving a static file at 4kB/s something must be horribly broken. :cry:
I just benchmarked http://motivate.im/, which is running at Heroku using this static hosting setup. I'm seeing 40 r/s, which is an order of magnitude more than we're seeing, and one or two orders of magnitude less than I would expect.
Wait, so motivate.im is running as just index.html or something? And you're getting 40 r/s? Seriously?
@clone1018 Yes. :-(
So after running these tests on a base nginx install here's the results (for the static page)
whit's personal home internet (13 hops): 95.68rps do vps next door: 4137.70rps dedi 12 hops away: 324.90rps
I have tested locally our default development setup, cheroot with 10 threads, against gunicorn running with 10 workers, both on 4 core laptop. The main difference being threads against processes.
Threads (cheroot):
$ ab -c20 -n1000 http://localhost:8537/assets/reset.css | grep '(mean)'
Requests per second: 63.99 [#/sec] (mean)
Time per request: 312.534 [ms] (mean)
Processes (gunicorn):
$ ab -c20 -n1000 http://localhost:8537/assets/reset.css | grep '(mean)'
Requests per second: 742.54 [#/sec] (mean)
Time per request: 26.935 [ms] (mean)
Doing the same for the current homepage.
Threads:
$ ab -c20 -n1000 http://localhost:8537/ | grep '(mean)'
Requests per second: 33.79 [#/sec] (mean)
Time per request: 591.962 [ms] (mean)
Processes:
$ ab -c20 -n1000 http://localhost:8537/ | grep '(mean)'
Requests per second: 165.96 [#/sec] (mean)
Time per request: 120.510 [ms] (mean)
I have created a new heroku app. Given the choice I've picked up region Europe since that is closer to me :). I have not added anything there (returns default 'get started' page) and repeated the load test.
$ ab -c20 -n1000 http://sheltered-lowlands-9512.herokuapp.com/ | grep '(mean)'
Requests per second: 104.96 [#/sec] (mean)
Time per request: 190.554 [ms] (mean)
I have not been able to find out how many hops away the site is. All I know is that it is more than 13. I have created another one in the default U.S. region.
$ ab -c20 -n100 http://calm-temple-2891.herokuapp.com/ | grep '(mean)'
Requests per second: 62.91 [#/sec] (mean)
Time per request: 317.930 [ms] (mean)
I've also run a redirect load test on the main gittip site:
$ ab -c20 -n100 http://gittip.herokuapp.com/ | grep '(mean)'
Requests per second: 45.44 [#/sec] (mean)
Time per request: 440.115 [ms] (mean)
That should rule out any SSL related slowdowns. Another reference is redoing the static file benchmark:
$ ab -c20 -n100 http://motivate.im/ | grep '(mean)'
Requests per second: 45.22 [#/sec] (mean)
Time per request: 442.240 [ms] (mean)
I did this to get some baseline to compare to. If we get to these numbers on the main site, it means the rest is on heroku (?)
I have put hello world
up on both test apps - one under gunicorn/lowlands, other under uwsgi/calm-temple. The rps and time remained similar to the default setup (uwsgi seems be responding better - the longest request time on gunicorn is larger than on uwsgi despite the fact that uwsgi/calm-temple-2891 is further from me).
Could it be some form of anti-DOS measure? We could do some coordinated ab
testing of one of the test apps I've set up. If our results are the same no matter if we run it all at the same time and separately, that could mean heroku can handle the load but is somehow limiting requests by IP.
I've run simple timeit test and it seems that the single threaded performance is only a bit worse than 2.67GHz i5 which should be plenty.
Hi,
Are you able to add the New Relic add on here so we can dig into this a bit?
Neil
Thanks Neil. We looked at adding New Relic a bit ago but it was pretty complicated, so we ended up logging metrics with Librato instead (1508). Do you have an idea of what you're looking for? Here's our dashboard.
Also, did we get [Bruce's email] and [Zbynek's email] cc'd on this ticket?
Thanks! :-)
chad
I don't have access to the ticket.
I still do not have access to that ticket. And I do not think we need New Relic. Those 2 simple apps I've set up display similar behavior and they basically do nothing (hello world). Had I have access to that ticket I'd have told Neil myself. Does Neil have github nick so we can ping him from here? There is lots of info here that is worth reading.
cc: @neilmiddleton
Hey @zwn, you should get updates to that ticket, but you won't have access to it. If we keep updates in the ticket then whoever is taking a look at it has everything they need.
Thanks @neilmiddleton. We were told over on https://help.heroku.com/tickets/100019 that we could request a manual CC on a ticket, and that we'd be able to have multiple people access one ticket. This works for https://help.heroku.com/tickets/100017, which @zwn opened and which I have access to.
Interesting, you're all added as cc's so it should be the same. Let me know if it's continuing.
I haven't taken the time to read through all the comments yet, but if there's anything that I can do to help, I'm at your service.
Make sure that you aren't running ab
on OSX. The one that Apple ships has some odd limitations that yield some odd under-preforming results.
Something is IMO horribly broken in gittip http serving. The file 'reset.css' is under 1kB in size. When testing the site with 'ab' it is easy to see that no matter the number of threads, with just 20 parallel connections we cannot get more than 5 requests per second. That is like 4kB/s transfer speed for a plain small file fully cached in memory. BTW: that transfer speed, that is just twice the size of our logs.
Just for the perspective - I believe each browser uses up to 6 connection for a single domain. The homepage does 8 requests to the main site. When someone on twitter with a 100 followers sends out a tweet where 1 out of 10 clicks the link, we get 10 users, 60 parallel connections and 80 requests.... and slow down to a crawl :(
And I am not even mentioning that hitting the homepage, the most db optimized page on gittip, we get not 5 but 1 request per second. So, if the site does not die right away (that is the better case), we go to 10s to 15s per request.