Simplifying polygons in build_shapes.py leads to holes and overlaps

jome1 commented 1 week ago

[x] I am using the current main branch or the latest release. Please indicate.
[x] I am running on an up-to-date pypsa-earth environment. Update via conda env update -f envs/environment.yaml.

Describe the Bug

The script build_shapes.py downloads the GADM layers from https://geodata.ucdavis.edu/gadm/gadm4.1/gpkg/{GADM_filename}.gpkg. When inspecting the created gadm_shapes.geojson closely, one can identify holes and overlaps between the polygons describing GADM layer 1 (here called "governorates"). See the following picture:

I think this is because within the script build_shapes.py the downloaded polygons are simplified.

The following picture compares the original boarders of the governorates with the simplified boardes. Thereby the newly created and simplified polygons at some locations either overlap or leave holes in between.

In other words: The polygons are all simplified seperately. So shared boarders (edges) are not considered.

This problem might also play a role when using GADM layers as the fundamental shapes as for example mentioned by @pz-max in https://github.com/pypsa-meets-earth/pypsa-earth/issues/382 and when building the bus regions.

Possible solution

Polygons can only be simplified where they are not sharing the line with another polygon (so skip the simplification for shared boarders) OR the conterminous (neighbouring) polygons have to be simplified to the same line.

I did a little internet search. Maybe shapely.snap could help to solve this problem.

helpful mini threads about this phenomenon:

Unfortunately, ArcGis is not free, but this is exactly what we need: https://pro.arcgis.com/en/pro-app/latest/tool-reference/cartography/simplify-shared-edges.htm

this looks also very promising: https://lin-ear-th-inking.blogspot.com/2023/03/simplifying-polygonal-coverages-with-jts.html

ekatef commented 6 days ago

Hello @jome1!

Fantastic, thanks a lot for sharing your in-depth research and a clear and elaborated explanation!! That is tremendously helpful. Dealing with with #382 has been quite a mystery for quite some time, while it seems that you have managed to resolve it 🎉 🎉 🎉

As for a solution, I fear that dropping _simplify_polys( ) will result in a complexity increase for the geometries used by the model and an associated rise of time and memory requirements. But I'm not sure how significant will be this rise, if it'd be an obstacle for running the model, and which workarounds are possible to compensate it. @jome1 what is your feeling about that?

@davide-f once you would have time, it would be great to have your opinion on possible solutions.

jome1 commented 6 days ago

Thanks for the praise!

Yes, I also fear that dropping _simplify_polys() increases computational burden. It should not be dropped.

I am trying to use PyPSA-Earth to generate some data for Egypt. I have custom model regions (we merged some governorates together to have fewer regions). Therefore I once tried to use the original GADM level 1 files, merge them to my needs and insert them to PyPSA. The run was very long and ended in an error. This error is another thing one need to be aware of when using custom model regions. It means that I was only able to use the GADM files created by PyPSA and adapt them, instead of the original GADM files, because somehow the datastructure of the .geojson is different (even though all attribute columns are the same).

So, the simplification function for the polygons is important.

The good thing is that I just found out that in the blog post I posted above, which explains the whole polygon simplification of adjacent polygons really well, it is also mentioned that the developed simplification algorithm should also be downstreamed to shapely (which is used in the script): "As usual, this algorithm will be ported to GEOS, from where it will be available to downstream projects such as PostGIS and Shapely."

I just hope that this process will not take too long. Maybe I'll find out their timeline.

I also found an alternative solution, which could be used in the meantime. But maybe it is too much effort. In this thread the format topojson is recommended (topojson on GitHub). It preserves the topology and uses less memory. Maybe worth a look into it.

jome1 commented 6 days ago

I asked in the shapely repo and the needed function coverage_simplify will probably be part of the 2.1 release. It is already possible to build shapely from a branch linked to the PR implementing the function.

Unfortunately there is not yet a release date. But this sounds promising!

ekatef commented 4 days ago

I asked in the shapely repo and the needed function coverage_simplify will probably be part of the 2.1 release. It is already possible to build shapely from a branch linked to the PR implementing the function.

Unfortunately there is not yet a release date. But this sounds promising!

Fantastic news! Thanks a lot for that 😄

davide-f commented 4 days ago

Many thanks @jome1 !!! Indeed that is an interesting feature. Currently, you could play with these numbers: https://github.com/pypsa-meets-earth/pypsa-earth/blob/c1d4b993dc52d1f22a46efab983bcef5289afb95/scripts/build_shapes.py#L353-L355C17

They are not available in the config file though, we calibrated as a trade-off between accuracy and computational time. Do you think the new 2.1 release will occur soon for what you understood? Because if that's the case, and you'd be interested, you would be welcome to propose a PR about coverage_simplify

jome1 commented 4 days ago

Unfortunately, I have no idea when the 2.1 release will happen. I will keep an eye on it.

pypsa-meets-earth / pypsa-earth

Simplifying polygons in build_shapes.py leads to holes and overlaps #1051

Describe the Bug

Possible solution