pypsa-meets-earth / earth-osm

Export infrastructure data from OpenStreetMap using Python
https://pypsa-meets-earth.github.io/earth-osm/
MIT License
23 stars 12 forks source link

Running out of memory on large (planetary) files #25

Open dziegler991 opened 1 year ago

dziegler991 commented 1 year ago

Hi all,

Does anyone have thoughts on how to programmatically break up large files to run them through earth-osm? I am constrained by memory. I am attempting the (almost) impossible and trying to run a planetary pbf for lines, generator, and substation. It would be ideal if you could chunk in the planet.pbf file but I am not sure that's possible.

Thoughts?

pz-max commented 1 year ago

Hi @dziegler991, sorry for the late response. You are the first outside of the PyPSA bubble using this package :1st_place_medal: Earth-osm should have no memory issues for single countries. So you could just iterate through the country list to be able to create an extract for the Earth.

You can see the regions with -> eo.view_regions(). Or any other idea @mnm-matin ?

But yeah, we never tried a planetary.pbf would be great if that works. I think these steps are necessary:

  1. we need to figure out how to chunk .pbf's
  2. Read the chunks/ extract information
  3. Save them to disk e.g. appending csv file
Mousa-Zerai commented 1 year ago

@dziegler991 Also trying to use this for a very large file, but there is no way to run a custom pbf file through the filters. The largest file from geofabrik is for Europe (26.2 GB) and the tool works for it. pinging @mnm-matin

mnm-matin commented 1 year ago

I think the problem is not with the extraction but with the writing of the csv and geojson files as those are kept in memory first before writing. There is a way to incrementally write these files as well now, but not correctly implemented to be used. The idea is that most use cases will be satisfied by passing in a list of all the the continents. keeping this open for now, for potential use cases...