metaodi / open-power-system-timeseries

https://morph.io/metaodi/open-power-system-timeseries
1 stars 0 forks source link

Scraper gets killed #1

Open metaodi opened 8 years ago

metaodi commented 8 years ago

I guess the morph.io scraper runs out of memory and therefore it gets killed (see https://morph.io/metaodi/open-power-system-timeseries).

So maybe a new approach is needed, where each dataset is processed independently, and then the results are written to the SQLite database and all the preliminary results can be deleted before the next is processed. This should reduce the memory usage enormously.

@ingmars @sjpfenninger @elkeschaper WDYT?

ingmars commented 8 years ago

Yes, that sounds like the way to go (i.e. doing the processing and writing to SQLite step by step for each of the CSVs individually, to use up less memory). Not sure though when I will have time to work on it. But it definitely sounds like the right solution. Probably it's also the first step for the "data updating" step we talked about, i.e. when the scraper is updated again next month it should only add on the additional CSV data from that step.

sjpfenninger commented 8 years ago

One of these days, I'll break up the script into smaller pieces upstream (in the notebook repo), then plug that back into the morph scraper. Hopefully that'll let the scraper run through.