ropensci / osmdata

R package for downloading OpenStreetMap data
https://docs.ropensci.org/osmdata
317 stars 45 forks source link

osm_points contain vertices of objects that are osm_polygons #123

Closed Robinlovelace closed 6 years ago

Robinlovelace commented 6 years ago

You may have known this already but it caught me unawares today while teaching. Reprex below which flags another (minor) issue about subsetting to bbox square or polygon. Not saying this is a bug but I think it needs documenting better at least:

library(osmdata)
#> Data (c) OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright
library(tidyverse)
#> ── Attaching packages ───────────────────── tidyverse 1.2.1 ──
#> ✔ ggplot2 2.2.1     ✔ purrr   0.2.4
#> ✔ tibble  1.4.2     ✔ dplyr   0.7.4
#> ✔ tidyr   0.8.0     ✔ stringr 1.3.0
#> ✔ readr   1.1.1     ✔ forcats 0.2.0
#> ── Conflicts ──────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag()    masks stats::lag()
library(sf)
#> Linking to GEOS 3.5.1, GDAL 2.2.2, proj.4 4.9.2

# check region
region = getbb("bristol", format_out = "sf_polygon")

# get shops
shops = opq("bristol") %>% add_osm_feature(key = "shop", value = "department_store") %>% 
  osmdata_sf()

nrow(shops$osm_points)  # lots of shops
#> [1] 142
nrow(shops$osm_polygons)  # and a few polygons
#> [1] 12
plot(shops$osm_polygons$geometry[1])
plot(shops$osm_points$geometry, add = TRUE)
# woa: some of the points are part of the polygons!

# which ones?
sel_touching = st_touches(shops$osm_points, shops$osm_polygons) %>% sapply(any)
#> although coordinates are longitude/latitude, st_touches assumes that they are planar
shops_points = shops$osm_points[!sel_touching, ]
nrow(shops_points)
#> [1] 1

# another potential issue: shops are returned outside the bounding polygon:
shops_bristol = shops$osm_points[region, ]
#> although coordinates are longitude/latitude, st_intersects assumes that they are planar
nrow(shops_bristol)
#> [1] 89
plot(shops_bristol$geometry, col = "red", add = T)

fzenoni commented 6 years ago

I am aware of this issue since quite some time, and this is why I "reduce" all the POI's I query to single Points. Since, e.g., some restaurants are returned as points and other as (multi)polygons, I usually separately compute the centroids of the latter category, and eventually merge them back with osm_points.

This is really due to OSM anarchy. Similar issues have been recently summarized here -> https://blog.emacsen.net/blog/2018/02/16/osm-is-in-trouble/

mpadge commented 6 years ago

Thanks @fzenoni - @Robinlovelace and I had also seen and discussed that blog post. Some interesting thoughts there. As for the issue here, it is an intentional part of osmdata design that all list items (POINT, LINESTRING, etc) hold all possible data, so that for example the elements of a given LINESTRING can be examined in detail. The points which make up an object often have their own descriptive fields, notably including traffic lights, for example.

The general approach to extract only points which are not parts of anything else (or same for lines, polygons, whatever) is the unique_osmdata() function. That'll instantly get rid of any points that are part of polygons. Alternatively, to extract only those points corresponding to a given higher-level object, you just need to

dat <- opq ("pune india") %>%
    add_osm_feature (key = "highway") %>%
    osmdata_sf (quiet = FALSE)
l1 <- dat$osm_lines [1, ] # first line, for example
# following line requires geom [[1]] for lines; geom [[1]] [[n]] for polygons; etc
pt_names <- rownames (l1$geometry [[1]])
# sf points are just lists, so names not rownames
indx <- match (pt_names, names (dat$osm_points$geometry))
pts <- dat$osm_points [indx, ]

pts then holds all details of all the points that make up the first linestring. These named objects are the way that osmdata tricks sf by hiding names of each list item. GDAL does not use such names, and so sf simply ignores them, enabling OSM structure to be translated into sf. The question is should there be some kind of function to enable this? For example:

obj <- match_osmdata (dat$obj_target, dat$obj_ref)

where match_osmdata will reduce the ref object to only those items present in target? Probably need better terminology, but hopefully you'll get my gist. Thoughts @fzenoni? @Robinlovelace?

Robinlovelace commented 6 years ago

Ultimately the points represented in plot(shops_bristol$geometry, col = "red", add = T) is one object so I think some people, especially people new to OSM data, should get some help in identifying that that is the case. My first thought was that there were 142 shops in the study region, and I'm an experienced R user who's used OSM before. People new to both subjects (like the students I was teaching) will easily make that mistake unless they are warned about it. A warning message saying something like the following could help:

Warning: some of the points in the downloaded data are part of
 larger objects(such as relations defining polygons).
 Take care not to count multiple points that are part of a single
 object as multiple objects for example with the function osm_aggregate_points().

osm_aggregate_points() is an imaginary function that could do this work, by taking an osmdata object in and returning points representing either 1) points in OSM that are single entities plus points representing the centroids of polygons containing multiple points (a sensible default my students would find useful I think) or 2) an object that simply omits points that are already captured by polygons (i.e. have all attributes equal to their larger related polygon).

That's my thought for now, not sure how hard it would be to implement, not too tricking I'd imagine... What do others think?

fzenoni commented 6 years ago

At first I misread the original post, and my answer refers to a different matter. But indeed, I had observed that same problem with churches, in which the polygon is stored, as well as points, representing for example entrance doors. At the time I was computing POI densities, and this fact, previously unknown to me, was considerably messing with my results.

However, as mentioned by @mpadge, I thought that this problem could be solved simply by applying unique_osmdata(), and for the cases I encountered so far it has been the case. If you repeat @mpadge's original exercise with the following:

dat <- opq('dublin') %>%
       add_osm_feature('amenity', 'place_of_worship') %>%
       osmdata_sf() %>%
       unique_osmdata()

you will find no superimposed POINT to the first building of the list. And the same goes for @Robinlovelace's department stores, for which the POINTs are drastically reduced after the osmdata_unique() treatment. Doesn't this already match the following requirement?

an object that simply omits points that are already captured by polygons (i.e. have all attributes equal to their larger related polygon).

Nevertheless, superimposed points are still present in the highway LINESTRING example, even with unique_osmdata(). I hope I got your point, but could it be that the proposal you're making is more relevant for the latter case? (On the other hand I realize the evidence I am showing is probably anecdotal.)

On a different subject:

@Robinlovelace and I had also seen and discussed that blog post.

I would be very interested in reading your observations. Where may I find them?

mpadge commented 6 years ago

@fzenoni: I'm no longer sure whether we have an issue here or not? I get no superimposed points with any application of unique_osmdata() - feel free to past some reproducible code if you can find any anomalous cases. @Robinlovelace I'm going to assign this issue to you, because I guess it boils down to a documentation issue, and I'm not sure what documentation would be required or desired here. (An admission on my part that I'm simply too close to all of this to see the proverbial forest for the tall green things.)

Finally, @fzenoni re: "OSM is in trouble": We mostly exchanged via email, not much in gh, but effectively both came to the conclusion that although some criticisms may have been valid, the ones regarding structural aspects were not. Changing OSM structures in any of the ways suggested might indeed make it easier for certain mapping services to become more flexible, integrated, whatever, but at the very significant cost of an overall less general and powerful data structure for analytic purposes.

fzenoni commented 6 years ago

@fzenoni: I'm no longer sure whether we have an issue here or not? I get no superimposed points with any application of unique_osmdata() - feel free to past some reproducible code if you can find any anomalous cases.

You are right, my bad.