ropensci / osmdata

R package for downloading OpenStreetMap data
https://docs.ropensci.org/osmdata
317 stars 45 forks source link

What objects does osmdata return? #1

Closed mpadge closed 7 years ago

mpadge commented 8 years ago

A question for @Robinlovelace and @hrbrmstr: How do we want the user to control what is returned from overpass? The current overpass_query relies, though process_doc, on the query being precise enough that it will only return the lowest desired OSM objects (node->way->rel), while osmdatar currently has the three functions get-points, get-lines, and get-polygons. The two approaches now need to be reconciled within osmdata, so I'll begin with an argument for my current approach:

  1. Bob's overpass approach relies on the queries being very precise, which they may not always be.
    1. Example: multiipolygons, for which a user may understandably desire the underlying way members rather than the multipolygon relation, yet the extension of process_doc to include processing of polygons would return just the latter and not the former.
    2. Example: There may be situations in which someone actually desires all the nodes which make up a way (that is, SpatialPoints), rather than the ways themselves (SpatialLines). This is possible with the osmdatar approach, yet not the overpass approach.
  2. Many OSM are represented in a variety of formats, regardless of how strict OSM guidelines are. Public transport stations are exemplary: Many may be either node or polygon objects, and simply asking for bus_station will return only those mapped as polygons, yet will miss all purely nodal stations.
    1. Example: The overpass approach (implicitly extended to polygon extraction) will not allow a map of bus_stations as simple points, because a query for bus_stations will return the highest hierarchical objects, which will be (polygonal) ways (or maybe even multipolygons).

Having stated the essence of my thoughts on the matter, I of course acknowledge that Bob's overpass approach is superior in one important way of obviating any need for users to specify a desired kind of output. This would be fantastic in a perfectly tidy OSM world, but I fear that the inherent messiness of everything actually requires the user to exert some degree of control, for which the simplest is surely specifying points, lines, or polygons. Thoughts?

Robinlovelace commented 8 years ago

Forcing the user to specify what they want at the outset will encourage them not to mis-understand OSM. I think the modularity of get_points(), get_lines() and get_polygons() is good. They can always be combined in a generic function that uses if statements to decide which output is best suited to the query.

mpadge commented 8 years ago

Operational first draft of merge now done, with a working resolution of this question currently being in process_doc (lines#19-26), yielding the ultimate form of returned data as

list (
        osm_nodes=rcpp_get_points (doc),
        osm_ways=rcpp_get_lines (doc),
        osm_polygons=rcpp_get_polygons (doc)
)

Question nevertheless remains open and definitely worth considering further ...

mpadge commented 8 years ago

And another q for @hrbrmstr : You've got an example in your README of returning an [out:csv query as a data.frame. The current merged version does not allow this (through forcing [out:xml only), but as I see it, all such data will always exist in the @data slot of the sp objects anyway. Let me know if i might have misunderstood and you in fact see a use for being able to directly return a data.frame without the intervening spatial guff? Thanks!

mpadge commented 8 years ago

sf has been released on CRAN, so an option can now be incorporated to enable the return of either sp or sf objects

Robinlovelace commented 8 years ago

Sweet!

Robinlovelace commented 7 years ago

Update: I've set overpass_query to only populate obj items if it contains data.

My new latest thinking on this: like st_read in sf now returns the first layer by default if nothing is set, I think the default should be to return the object class with the most elements but emit a message telling the user about what data is getting lost and how to retrieve it (e.g. with an arg like return_list = FALSE by default but which can be set to TRUE.

mpadge commented 7 years ago

Slightly tweaked your fix to make sure objects (points/lines/polygons) are returned even if NULL. (And note that an st_read()-type approach will not work here because points will always win.) Current osmdata class (demonstrated in the README):

q0 <- opq (bbox=c(-0.12,51.51,-0.11,51.52)) # Central London, U.K.
q1 <- add_feature (q0, key='highway', value='secondary')
bu <- overpass_query (q1)
bu
#> Object of class 'osmdata' with:
#>   $bbox          : 51.51,-0.12,51.52,-0.11
#>   $overpass_call : The call submitted to the overpass API
#>   $timestamp     : [ Thu Dec  1 11:52:33 2016 ]
#>   $osm_points    : 'sp' SpatialPointsDataFrame   with 27 points
#>   $osm_lines     : 'sp' SpatialLinesDataFrame    with 6 lines
#>   $osm_polygons  : NULL

Notwithstanding potential sf migration (see #17), I think this is a pretty reasonable class structure. It'll definitely be type-stable.

Problems?

  1. The difficulty of implementing a plot method, although this would be possible by, for example, simply plotting everything.
mpadge commented 7 years ago

See this comment for some very important thoughts on the appropriateness of sf for OSM data

mpadge commented 7 years ago

Closing this now, because I've significantly revised get-osmdata.cpp so it processes all data at once (compared to previous approach of separately processing points, lines, polygons). The three spatial forms are thus now inseparable, and so osmdata has to return a list with all of them, in the class structure shown above.

Robinlovelace commented 7 years ago

Fantastic work, this looks like a great solution.