ropensci / osmdata

R package for downloading OpenStreetMap data
https://docs.ropensci.org/osmdata
317 stars 45 forks source link

Merge multipolygons with touching inner rings #30

Open mpadge opened 7 years ago

mpadge commented 7 years ago

These are permitted in OSM yet not by SF. osmdata.cpp needs to be modified to merge any touching inner rings into single rings. See also this extended OSM discussion.

Robinlovelace commented 7 years ago

I must confess I don't fully understand what this means.

mdsumner commented 7 years ago

It means that the two polygons in this first figure cannot be a single "feature", for example (left hand panel only):

image

If those two polygons were "unioned" the shared edge would have to go (a lossy process), unless the two sub-features were stored within a GEOMETRYCOLLECTION.

@Robinlovelace Does that help?

mdsumner commented 7 years ago

(Gee that OSM wiki page is good, I've never found this explained properly visually before. )

Robinlovelace commented 7 years ago

Thanks for the clarification @mdsumner. Do you have code that shows how this failing with sf. Would be the final step in me getting this I think - from the OSM wiki there are a number of edge cases that fail it seems.

mdsumner commented 7 years ago

No I don't, I would not expect sf to do that level of checking on creation, but it might now given that GEOS in built -in. I can craft a dummy example but I'll follow up if there's a known case using osm.

mdsumner commented 7 years ago

Here's an example, not the minimal possible but easy to understand I think:

tfile <- sprintf("%s.rda", tempfile())
download.file("https://github.com/mdsumner/scsf/raw/master/data/minimal_mesh.rda", tfile, mode = "wb")
load(tfile)
library(sf)

inner_ring_touching <- st_sf(a = 1, geometry = st_sfc(st_multipolygon(unlist(lapply(st_geometry(minimal_mesh), unclass), recursive = FALSE))))

#plot(inner_ring_touching)
st_is_valid(inner_ring_touching)
#[1] FALSE
#Warning message:
#In evalq((function (..., call. = TRUE, immediate. = FALSE, noBreaks. = FALSE,  :
#  Self-intersection at or near point 0.68999999999999995 0

can we fix it?

valid_mp <- st_buffer(inner_ring_touching, dist = 0)
#plot(valid_mp)
st_is_valid(valid_mp)
#[1] TRUE

The zero width buffer thing is a bit weird, but it's a "well-known" hack. To detect the invalidity and remove it you need to decompose to segments and determine if any edges are repeated in one feature (they would pass each other in different directions in this case) - the zero-width buffer op does this and then gives back a simple feature after the offending edge is detected and the rings reconstructed as a single path. Many tools work like this, a rich graph of the underlying topology is constructed, analysed, discarded in the underlying libs and then out comes a simplified summary.

I have no idea if this "fix" would be useful or damaging in the osmdata context, but I'm happy to help figure that out.

mpadge commented 7 years ago

That's awesome @mdsumner! This issue is actually pretty important, because we want osmdata to be maximally compliant with OSM itself, and this was one of the few notable cases in which sf/sp prevented that. It was nevertheless bumped fairly low on my priority status, so i'm mighty pleased that you've just conjured up a grand first cut. :1st_place_medal:

... only problem remains achieving this without direct Imports: sf, but that shouldn't be too tricky.

mdsumner commented 7 years ago

It's not so easy - I think! - but I've worked on each component step for different reasons in the past. The primitive model in sc is the least developed, I get stuck on how to keep the segment vertex pairs, whether to record their path-grouping, keep track of unique directed vs. undirected segments etc.

It will happily build the segment graph from sf:

# devtools::install_github("mdsumner/scsf")
library(scsf)
library(sc)
prim <- PRIMITIVE(inner_ring_touching)

## which segments have to go? find where interchanging vertex pairs is not unique
prim$segment$useg <-  apply(cbind(prim$segment$.vertex0, prim$segment$.vertex1), 1, function(x) paste(sort(x), collapse = "-"))
prim$segment$bad <- duplicated(prim$segment$useg)

## now we have to piece them back together, we should work on just the feature
## with the touching paths so we'd have to identify those first 
prim$segment %>% filter(!bad)

## then we need to trace around the vertex pairs to figure out the boundary cycle and reconstruct it
## in place of the original two paths

I've definitely done each of these things before in small part, it will be simpler to use sc::PRIMITIVE per feature rather than per-layer, as it's not neighbouring features you care about. Having tools for segments versus parts is very high on my agenda at the moment, I'm still searching around for the right approach but I think there's a lot to be gained here - a set of component tools for pulling things apart and putting them back together just so.

I imagine we could use igraph directly for some of this, it's not so hard to go to tidygraph and igraph from here, but I need more understanding of igraph generally to see how to drive it better.

mdsumner commented 7 years ago

I managed to wade through the various pieces I'd experimented with previously and put together some examples that almost finish the task. The first re-constructs the rings of the example above by identifying the offending segment and then finding "cycles" in the segments that remain, identifying and isolating their vertex path.

http://rpubs.com/cyclemumner/291559

This one simply replaces the simple data set with a more complex one and does the same, to show that it works on non-trivial cases.

http://rpubs.com/cyclemumner/291560

The code is ugly, and it "does my head in" because of the constant juggling of relational "tidy" ID matching with structural-index "position in array" matching. But it works, and I'm happy as it brings together quite a few things I've learnt in recent times.

I've used ggplot to plot it because I'm not quite ready to think about the per-feature aspect, but this also has problems with "holes require polypath, but also cannot also carry further group aesthetics" - which is a general limitation of ggplot2, and is related to why sf exists now (and recent updates to leaflet) to remove the "hole/part" ambiguity of multi-polygons.

The second one really highlights the need for this to be done per feature, because we otherwise will have no ability to determine where the holes belong. So there's a reasonable "to-do" list from my perspective, but I hope this shows the promise of what we could achieve with some tools to deal with segment graphs and the higher level groupings we now take for granted in the tidyverse. I really think that igraph and tidygraph will help enormously here too - I'm sure there's a find-cycles ability in igraph, but I just haven't quite been able to apply it in full - andt we need some more attention on their application to "spatial" stuff.

mdsumner commented 7 years ago

I got this in better shape, tested quite a few cases. I re-made some lovely mistakes and so learnt heaps. In short it's straightforward to do it in R, to decompose to segments, remove those that have an intersecting segment heading in the opposite direction (and that trip the GEOS validity alarm), then reconstruct the remaining valid ring by tracing around vertices. I've got it spread across two packages because of other plans but it's simple enough to implement independently without much hassle.

mpadge commented 7 years ago

This is awesome stuff - thanks so much for your great work here Mike! I haven't had time to do anything about it in the last week, but will be on it asap now ...

mdsumner commented 7 years ago

I've become a bit obsessed with this - it's been a dusty corner on the edge of my thinking for a while, and I've written a faster version using the "datastructures" package, but I'm still doing something wrong there. That cycles() function above won't scale and we'll need something much better. (Using datastructures::hashmap is faster, but I've found examples that fail for some reasons, you could probably see a way to do it easily :).

I also realize that this is exactly what raster::rasterToPolygons does with dissolve=TRUE via rgeos via sp::aggregate. It could be faster than dissolve to use rgeos::gBuffer(x, width = 0) after flattening all the polygon lists together, but what that does is exactly what we are doing, a version of chaining external edges into paths from the segment graph.

Do we have a concrete example where this inner boundary thing is a problem from osmdata?

(I've also realized now why Manifold don't disallow "internal boundaries", it's because they fundamentally always use a constrained triangulation for geometry tests, so they completely avoid the pathologies mentioned in that OSM discussion - because they aren't worried about orientation to an edge, they test intersection with simpler elements that are grouped into larger ones. A ha). If I can get this segment path thing sorted out it's a nice bridge between the finite elements approach and the path based approach to shapes.

Robinlovelace commented 7 years ago

Looks like some something that could be useful beyond osmdata. Great work by the looks of it!

mpadge commented 7 years ago

@mdsumner I reiterate: This is such awesome and fundamentally important stuff you're developing here. My promised time to actively work on it is still a day or two away, but note in the meantime that the fabulous ggm::fundCycles() routine does exactly what you need. You just need to convert your vertex matrix to a square matrix of binary neighbor connections, which is exactly what igraph::as_adj() does. That should at least be your scale-able solution there.

I must nevertheless say that at least having had the time to stick all of your analyses in my thinking cap for a few days actually disinclines me to implement this - the restriction here is SF (using my usual distinction of "SF" = OGC; "sf" = edzer), and what OSM allows is entirely sensible and more flexible. I'm very intrigued by implementing an additional osmdata_sc() function, in combination with concurrent development of the gladr issue, and ultimately to allow, encourage, and facilitate any and all of the kind of things you've been exploring here. Exciting stuff!