pypsa-meets-earth / earth-osm

Export infrastructure data from OpenStreetMap using Python
https://pypsa-meets-earth.github.io/earth-osm/
MIT License
23 stars 12 forks source link

Decouple output folder and intermediate raw data folder #40

Closed davide-f closed 9 months ago

davide-f commented 1 year ago

Is your feature request related to a problem? Please describe.

It would be nice to have the option to specify a dlfolder to keep the data (raw and intermediate), while the output files may be saved into another folder. This should ease some easy paralleling options. In the pypsa-earth case, we could specify as intermediate folder the data folder, while as output the resource folder. When data are already downloaded, this would ease the processing quite a lot: the different processes do not have conflicts.

To add on this, it would be nice to have some parallel-safe operation if two or more processes share the same data dir.

Describe the solution you'd like An option output_dir, beyond data dir would be nice to have. The default value may be None and in that case the data dir is used.

This helps parallelization, but it is not completely parallel-safe. Alternatives are welcome obviously :)

mnm-matin commented 1 year ago

yeah, you're right that is still hardcoded ideally there needs to be: [ ] a function to download the pbf file for a given region at a given location [ ] a function to get the dataframe based on a given pbf file. [ ] add planetary osm file as a source (for people who have enough memory)

also in what terms do you mean parallel, downloading of pbf files or processing? i would not recommend implementing parallel downloading as it would hog the geofabrik servers, there are plans to support planetary files and hosted/updated filter files for power users.

mnm-matin commented 9 months ago

this has been solved