ropensci / osmdata

R package for downloading OpenStreetMap data
https://docs.ropensci.org/osmdata
317 stars 45 forks source link

implement proper multipolygon handing #20

Closed mpadge closed 7 years ago

mpadge commented 7 years ago

Just a flag for me to do that

Robinlovelace commented 7 years ago

I imagine this will fix this error (was just about to open a new ticket then saw this):

bb <- getbb(place_name = "Leeds")
q <- opq(bbox = bb)
q <- add_feature(q, "amenity", "park")
res <- osmdata_sp(q = q, quiet = FALSE)

## Error in rcpp_get_osmdata(doc) : No bounding box able to be determined

Just tried this on a new computer with a fresh install of the latest version of osmdata.

mpadge commented 7 years ago

By 'proper', I meant inside-outside relationships, which are not yet done...

Robinlovelace commented 7 years ago

OK - is the error I mentioned unrelated? Can open a separate issue if so and look at what's causing it if I get a chance. Notice it's in the C++ code so may be a little hard for me to debug though...

mpadge commented 7 years ago

This commit now gives:

> bb <- getbb(place_name = "Leeds")
> q <- opq(bbox = bb)
> q <- add_feature(q, "amenity", "park")
> osmdata_sp(q = q)
Error in rcpp_get_osmdata(doc) : Query returned no data

> q <- opq(bbox = bb)
> q <- add_feature(q, "leisure", "park")
> osmdata_sp(q = q)
Object of class 'osmdata' with: (... lotsa data)

Now the question, slightly off track for this issue, but minor enough to leave here: Would it be more useful to return an empty data structure (like sf)? Note that this issue will only arise when there really is no data whatsoever - everything will otherwise work fine when even just a single point (node) is returned. In our case, I'd say erroring with no data whatsoever seems like a sensible strategy.

Robinlovelace commented 7 years ago

I think the error is useful (apologies for giving a dud example). People are very particular about giving meaningful error messages so it's really fantastic that we provide that now. I'd like to say it was a trick question but it wasn't. Error on my part that could've been checked with a quick visit to overpass-turbo - hopefully a useful erroneous query though!

Robinlovelace commented 7 years ago

By way of validation, here's a verbose way of asking for the full set of parks (from http://overpass-turbo.eu/s/l05 ):

<osm-script output="json" timeout="25">
  <union into="_">
    <query into="_" type="node">
      <has-kv k="leisure" modv="" v="park"/>
      <bbox-query e="-1.2435150146484375" into="_" n="53.9682045419455" s="53.67596808376696" w="-1.8470764160156248"/>
    </query>
    <query into="_" type="way">
      <has-kv k="leisure" modv="" v="park"/>
      <bbox-query e="-1.2435150146484375" into="_" n="53.9682045419455" s="53.67596808376696" w="-1.8470764160156248"/>
    </query>
    <query into="_" type="relation">
      <has-kv k="leisure" modv="" v="park"/>
      <bbox-query e="-1.2435150146484375" into="_" n="53.9682045419455" s="53.67596808376696" w="-1.8470764160156248"/>
    </query>
  </union>
  <print e="" from="_" geometry="skeleton" limit="" mode="body" n="" order="id" s="" w=""/>
  <recurse from="_" into="_" type="down"/>
  <print e="" from="_" geometry="skeleton" limit="" mode="skeleton" n="" order="quadtile" s="" w=""/>
</osm-script>
mpadge commented 7 years ago

@Robinlovelace A quick (sorry, it got extended) Q for ya: GDAL dumps all OSM polygons into Simple Features MULTIPOLYGONs. I think it would be more useful to dump single polygons into straight POLYGONs, and to reserve MULTIPOLYGONs only for true OSM multipolygon relations. What do you think?

Actually, it's probably necessary to provide more info for you to give an informed answer, for which I will accord with respective OGC and OSM standards of CAPS and no caps, so ...

An OSM multipolygon has IDs for each way of which it is composed. The ways themselves are also stored as single lines (SF/GDAL LINESTRINGs, retrievable with sf::st_read (...,layer='linestring')). The multipolygons returned by GDAL/R::sf retain only the ID of that object, and contain the geometry of all line segments and polygons, yet no IDs, and so it is impossible to relate any multipolygon to the underlying ways (in OSM-speak, or LINESTRINGs in SF/GDAL-speak) which comprise it. This is equivalent to treating a MULTIPOLYGON as a singular entity with no relationship to its underlying components. Using sf::st_read (...,layer='multipolygons') and sf::st_read (...,layer='linestring') will thus return much common data, yet with no way of knowing which. If both were plotted, for example, the result would depend on plot order.

I am nearly done with the sf implementation, in which I will store the OSM IDs for each underlying way as a single entry in the data.frame of an sf::sf object. These MULTIPOLYGON (and also MULTILINESTRING) objects thus have the usual osm_id for the object itself, and returned by GDAL/sf, along with additional osm_id tags for all components, and it will therefore always be possible to use the sf data.frame of MULTIPOLYGONs to extract (or remove, or whatever) the underlying SF/GDAL LINESTRINGs of which it is composed.

(FYI: Another major + for osmdata is that I store osm_id values for each node, whereas GDAL/sf also ditch all of these. GDAL/sf keeps only one osm_id value for each row of any SF object; osmdata keeps all of them for all polygons, ways, and nodes.)

Now in answering the question, please consider carefully that we will be ditching the accepted standard approach! Having said that, this would in no way contravene SF principles, and OSM admits an ultimate irreconcilability with SF. The brevity of that page could indicate a strong opportunity for us, but also that it's really important to think really carefully about this sooner rather than later.

Robinlovelace commented 7 years ago

This is very interesting. I think it explains why people brought up with GIS like me find OSM data sometimes so weird. Storing the underling point and way ids, from with polygonal relations are composed, will be a unique feature of osmdata and really valuable. I'm not sure how this will look and think documenting the structure of the data in the vignette will be a priority for explaining it to ourselves and others.

How about you start a vignette on OSM data structure in parallel with development of the code and I chip in? Or I'll happily start this important doc and get as far as I can.

Robinlovelace commented 7 years ago

Would it be correct to say that osmdata will provide an R implementation of OSM's Elements data model? And hopefully tools to translate them, in a flexible way, into standard geographical data forms.

mpadge commented 7 years ago

Yeah, I had also realised the necessity of a separate vignette here. I'll be happy to make a start on that. As for OSM Elements - nope, that's not what it means. It really all just comes down to translating OSM structures into Simple Features. Points and Lines are straightforward, but Polygons are not ... I'll let ya know when the vignette is far enough along.

Robinlovelace commented 7 years ago

Cheers, look forward to it - sure it will be an enlightening vignette. Happy to add to it as my own understanding evolves.

mpadge commented 7 years ago

multipolygons now properly dealt with in osmdata_sf. Gunna keep this issue open until osmdata_sp handles them in the same way. I've also made a good start on osm-to-sf vignette: see #14