nprapps / elections18-general

2018 midterm election back-end: Associated Press data ETL, database, admin panel, and JSON output; iteration upon 2016 GE work
MIT License
3 stars 1 forks source link

Production admin panel prone to `504` gateway timeouts #30

Closed mileswwatkins closed 6 years ago

mileswwatkins commented 6 years ago

These can be solved with a refresh of the page, but can be annoying. Seem to happen relatively regularly.

mileswwatkins commented 6 years ago

Ugh, I'm pretty sure this is a uwsgi error, but the log isn't very helpful, at least at current verbosity:

[pid: 11322|app: 0|req: 95/95] 205.153.36.170 () {44 vars in 734 bytes} [Thu Oct 11 17:51:02 2018] GET /elections18/calls/senate/ => generated 332517 bytes in 1543 msecs (HTTP/1.1 200) 3 headers in 146 bytes (2 switches on core 0)
[pid: 11322|app: 0|req: 96/96] 205.153.36.170 () {44 vars in 734 bytes} [Thu Oct 11 17:51:13 2018] GET /elections18/calls/senate/ => generated 332517 bytes in 1471 msecs (HTTP/1.1 200) 3 headers in 146 bytes (2 switches on core 0)
[pid: 11322|app: 0|req: 97/97] 205.153.36.170 () {44 vars in 734 bytes} [Thu Oct 11 17:51:25 2018] GET /elections18/calls/senate/ => generated 332517 bytes in 1908 msecs (HTTP/1.1 200) 3 headers in 146 bytes (2 switches on core 0)
[pid: 11322|app: 0|req: 98/98] 205.153.36.170 () {50 vars in 936 bytes} [Thu Oct 11 17:51:27 2018] POST /elections18/calls/senate/call-chamber => generated 7 bytes in 20 msecs (HTTP/1.1 200) 3 headers in 141 bytes (1 switches on core 0)
[pid: 11322|app: 0|req: 99/99] 205.153.36.170 () {44 vars in 734 bytes} [Thu Oct 11 17:51:27 2018] GET /elections18/calls/senate/ => generated 51 bytes in 21 msecs (HTTP/1.1 500) 3 headers in 161 bytes (1 switches on core 0)
[pid: 11322|app: 0|req: 100/100] 205.153.36.170 () {44 vars in 734 bytes} [Thu Oct 11 17:51:37 2018] GET /elections18/calls/senate/ => generated 332521 bytes in 1861 msecs (HTTP/1.1 200) 3 headers in 146 bytes (2 switches on core 0)
...The work of process 11322 is done. Seeya!
worker 1 killed successfully (pid: 11322)
Respawned uWSGI worker 1 (new pid: 12114)
[pid: 12114|app: 0|req: 101/101] 205.153.36.170 () {44 vars in 781 bytes} [Thu Oct 11 17:51:47 2018] GET /elections18/calls/senate/ => generated 332521 bytes in 1429 msecs (HTTP/1.1 200) 3 headers in 146 bytes (1 switches on core 0)
[pid: 12114|app: 0|req: 102/102] 205.153.36.170 () {42 vars in 714 bytes} [Thu Oct 11 17:51:49 2018] GET /elections18/less/calls.less => generated 167879 bytes in 1515 msecs (HTTP/1.1 200) 3 headers in 130 bytes (1 switches on core 0)
mileswwatkins commented 6 years ago

Python app error, from earlier this week:

Exception on /elections18/calls/house/ [GET]
Traceback (most recent call last):
  File "/home/ubuntu/apps/elections18_general/virtualenv/lib/python3.5/site-packages/flask/app.py", line 2292, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/ubuntu/apps/elections18_general/virtualenv/lib/python3.5/site-packages/flask/app.py", line 1815, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/ubuntu/apps/elections18_general/virtualenv/lib/python3.5/site-packages/flask/app.py", line 1718, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/home/ubuntu/apps/elections18_general/virtualenv/lib/python3.5/site-packages/flask/_compat.py", line 35, in reraise
    raise value
  File "/home/ubuntu/apps/elections18_general/virtualenv/lib/python3.5/site-packages/flask/app.py", line 1813, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/ubuntu/apps/elections18_general/virtualenv/lib/python3.5/site-packages/flask/app.py", line 1799, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "app.py", line 61, in calls_admin
    chamber_call_override = results.first().meta.first().chamber_call_override
AttributeError: 'NoneType' object has no attribute 'meta'
mileswwatkins commented 6 years ago

cc https://github.com/nprapps/elections18-general/issues/24

mileswwatkins commented 6 years ago

Regardless of database-up-ness, why is a successful page request taking 30 seconds?! There's got to be a more effective, atomic query; I imagine right now there're many round-trips to the database to build a single page, and that's causing a lot of our woes.

mileswwatkins commented 6 years ago

But right now, refreshing the page can recover from a 504 with a refresh. And no more errors in the Python log.

mileswwatkins commented 6 years ago
screen shot 2018-10-24 at 16 42 21
mileswwatkins commented 6 years ago

Profiled, and it does look like the database call is expensive (and repeated many times) even locally:

app_1           | PATH: '/elections18/calls/house/'
app_1           |          3903909 function calls (3544002 primitive calls) in 52.186 seconds
app_1           |
app_1           |    Ordered by: internal time, call count
app_1           |    List reduced from 339 to 30 due to restriction <30>
app_1           |
app_1           |    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
app_1           |      9721    9.013    0.001   10.521    0.001 {method 'execute' of 'psycopg2.extensions.cursor' objects}
app_1           |      9721    2.908    0.000    2.908    0.000 {method 'commit' of 'psycopg2.extensions.connection' objects}
app_1           |    534117    2.497    0.000    2.497    0.000 {built-in method builtins.isinstance}
app_1           | 136217/9721    2.433    0.000   13.464    0.001 /usr/local/lib/python3.7/site-packages/peewee.py:1842(_parse)
app_1           | 126490/9721    2.132    0.000   14.002    0.001 /usr/local/lib/python3.7/site-packages/peewee.py:1881(parse_node)
app_1           | 19444/9721    1.754    0.000   13.159    0.001 /usr/local/lib/python3.7/site-packages/peewee.py:1903(parse_node_list)
app_1           | 196700/186980    1.237    0.000    6.658    0.000 {built-in method builtins.getattr}
app_1           |     48698    1.222    0.000    2.227    0.000 /usr/local/lib/python3.7/site-packages/peewee.py:1785(_parse_field)
app_1           |    235859    0.986    0.000    0.986    0.000 {method 'append' of 'list' objects}
app_1           |     10839    0.950    0.000    5.002    0.000 /usr/local/lib/python3.7/site-packages/peewee.py:2429(process_row)
app_1           |      9721    0.870    0.000   17.756    0.002 /usr/local/lib/python3.7/site-packages/peewee.py:1984(generate_select)
app_1           |     34571    0.825    0.000   49.700    0.001 /opt/elections18/templates/calls.html:5(root)
app_1           |     19441    0.735    0.000    1.406    0.000 /usr/local/lib/python3.7/site-packages/peewee.py:2818(_model_shorthand)
app_1           |     83683    0.726    0.000    1.995    0.000 {built-in method builtins.setattr}
app_1           | 87596/132    0.684    0.000   49.862    0.378 {method 'join' of 'str' objects}
app_1           |     73964    0.646    0.000    0.936    0.000 /usr/local/lib/python3.7/site-packages/peewee.py:924(__set__)
app_1           |    107117    0.591    0.000    0.591    0.000 /usr/local/lib/python3.7/site-packages/peewee.py:1726(quote)
mileswwatkins commented 6 years ago

That refactor drops the time down to about 20% of what it was before locally, going to see how it is in production.

mileswwatkins commented 6 years ago

Hell yes; production's down from 30s to load the House page to 2s:

screen shot 2018-10-25 at 13 17 43
mileswwatkins commented 6 years ago

Marking this as mostly finished for now. Haven't seen a 504 since the updated deployment. Occasionally there will be 500s when the database is empty, which is to be expected.