scraperwiki / spreadsheet-download-tool

A ScraperWiki plugin for downloading data from a box as a CSV or Excel spreadsheet
BSD 2-Clause "Simplified" License
2 stars 1 forks source link

Do not grow memory without bound #34

Open drj11 opened 10 years ago

drj11 commented 10 years ago

extract.py seems to grow without bound when doing its stuff (observe with htop).

Not only is this bad form generally, but on free (which is limited to 512MB) it causes the process to be abruptly killed mysteriously. (which leaves lots of files behind: issue #31)

pwaller commented 10 years ago

Good luck with that.

For a 21mb all_tables.xls it has a maximum resident set of 1.8GB.

pwaller commented 10 years ago

This one is still an issue since writes aren't streaming yet.

pwaller commented 10 years ago

I've just tried xlsxwriter. In the {'constant_memory': True} mode it uses virtually no memory but a lot of CPU time. On my laptop around 14 seconds for ~100krow, compared to "almost nothing" for CSV. I suggest we put in a hard limit of 100krow anyway and use pyexcelerate since it is considerably faster. Currently testing the performance of these in a few scenarios, PR incoming soon.

@drj11 @morty