jsfenfen / 990-xml-database

Django app to consume and store 990 data and metadata
BSD 2-Clause "Simplified" License
23 stars 16 forks source link

Added multithreading to load_filings.py #10

Closed NobodyAnon closed 6 years ago

NobodyAnon commented 6 years ago

Added basic multithreading for a faster run.

jsfenfen commented 6 years ago

Thanks @NobodyAnon. I'm gonna merge this, but save your command as load_filings_multithreaded. This looks helpful, but I think has the effect of multiplying by 8 the amount of memory required? Did you find this sped things up?

NobodyAnon commented 6 years ago

It does up the amount of resources required, both in ram & cpu, but it lets you do things like spin up a large AWS instance to load data much faster, then drop the instance size when the push is complete.

This brought load time down from 6 hours to 2 hours (as well as temporally assigning more resources to the box and database).