palewire / django-calaccess-campaign-browser

A Django app to refine, review and republish campaign finance data drawn from the California Secretary of State’s CAL-ACCESS database
http://django-calaccess-campaign-browser.californiacivicdata.org
MIT License
17 stars 12 forks source link

Loader management command needs to move much faster #48

Closed palewire closed 10 years ago

palewire commented 10 years ago

These loops must be part of the culprit. Could we write custom SQL commands that speed it up?

palewire commented 10 years ago

As a preliminary step, I've broken each of the loader methods into separate commands so that they can be refactored individually. You can see the result in the much simplified build_campaign_finance command.

aboutaaron commented 10 years ago

@palewire very nice. What would be the start steps to writing the custom SQL commands? cc @armendariz

palewire commented 10 years ago

machete

I don't think there's any trick to it, just machete like labor. My hope is that if we can reduce some of the loops and joins to subqueries or temporary tables and ultimately load the processed tables with swift bulk INSERT SQL commands.

palewire commented 10 years ago

Commands that need to be refactored:

palewire commented 10 years ago

Our first rewrite is done, and most of the commands are much faster. But the summaries one is still a big logjam and needs attention in the future. I'll make a ticket for that separately.