openaddresses / machine

Scripts for running OpenAddresses on a complete data set and publishing the results.
http://results.openaddresses.io/
ISC License
97 stars 36 forks source link

Slow queries for results.openaddresses.io #777

Closed iandees closed 4 years ago

iandees commented 4 years ago

Queries for the main pages on results.openaddresses.io are slow enough to time out the gunicorn process and give back 502's lately. The RDS database is churning on queries:

image

We've had a couple previous attempts to improve query times (e.g. #774) but nothing is great. To resolve for now, I'm going to move data from the runs table into a new runs_archive copy and then delete data in the runs table before 2019-01-01.

This will result in some sources dropping (we'll lose our memory of the last successful run for sources that haven't had a successful run since 2019-01-01) but should significantly speed up the query and let things run again for another few years.

iandees commented 4 years ago

I ran the following SQL the other night:

machine=> create table runs_archive as select * from runs;
SELECT 991904

But then this delete was taking 18+ hours:

machine=> delete from runs where id <= 564648;

So I canceled it and moved the RDS instance over to a larger instance type. Queries are much faster now, but the instance is more expensive, so I'm going to try making it smaller again and see if creating a new instance helps.

iandees commented 4 years ago

I moved the instance type back to the original and it seems to be happier now. I think the previous instance, having sat around running for multiple years without stopping, was just broken.