Open Mashin6 opened 2 years ago
Thanks @Mashin6, and apologies for the delay in responding here. It's not a bug, it reflects the operation of the exclude
parameter in the trim_osmdata()
function:
library (ggplot2)
library (osmdata)
#> Data (c) OpenStreetMap contributors, ODbL 1.0. https://www.openstreetmap.org/copyright
area <- getbb("Southeastern Connecticut COG", featuretype = "boundary", format_out = "polygon")
towns <- opq(bbox = area) %>%
add_osm_feature(key="admin_level", value="8") %>%
osmdata_sf()
plot1 <- function (towns, exclude = TRUE) {
towns |>
trim_osmdata(area, exclude = exclude) |>
magrittr::extract2 ("osm_multipolygons") |>
ggplot() +
geom_sf()
}
plot1 (towns, exclude = TRUE)
#> Loading required namespace: sf
plot1 (towns, exclude = FALSE)
Created on 2022-01-02 by the reprex package (v2.0.1.9000)
Though e.g. Colchester is completely within SCCOG, because the same ways that define COG boundary also define the boundary of the town. So I would still expect it to be included. Hmm unless the polygon from Nominatim has wrong shape. I guess more reason to have server side trimming.
www.openstreetmap.org/relation/11065863 www.openstreetmap.org/relation/11059185
Yeah, right, it's an issue with how to interpret multipolygons. The trim operation for Colchester is extracting the internal component of the multipolygon. In cases where internal components of multipolygons are all distinct and do not share any boundaries, they will generally (in OSM terms) define holes within the surrounding main polygon. It might thus be more appropriate to default to extracting the primary surrounding polygon, rather than the internal components. I'll re-open to implement that.
@Mashin6 The above commit improves the trim_osmdata()
function, but your issue still remains. It is, however, due to a bug in the OSM data set, as this reprex
illustrates:
library (ggplot2)
library (sf) # load for sub-setting
#> Linking to GEOS 3.9.1, GDAL 3.3.1, PROJ 8.0.1; sf_use_s2() is TRUE
library (osmdata)
#> Data (c) OpenStreetMap contributors, ODbL 1.0. https://www.openstreetmap.org/copyright
area <- getbb("Southeastern Connecticut COG", featuretype = "boundary", format_out = "polygon")
towns <- opq(bbox =area) |>
add_osm_feature(key="admin_level", value="8") |>
osmdata_sf()
mp <- towns$osm_multipolygons$geometry [[22]] |>
st_sfc (crs = 4326)
ggplot (mp) + geom_sf ()
bb <- st_sfc (st_polygon (list (area)), crs = 4326)
st_within (mp, bb)
#> Sparse geometry binary predicate list of length 1, where the predicate
#> was `within'
#> 1: (empty)
That result shows that the multipolygon does not lie within the bounding box. The following shows in a bit more detail why that happens (using sp
for simple return of direct integer values), including first necessary step of transforming to non-elliptical CRS:
mp <- st_transform (mp, 3857) [[1]] [[1]] [[1]] # multipolygon
bb <- st_transform (bb, 3857) [[1]] [[1]]
pip <- sp::point.in.polygon (mp [, 1], mp [, 2],
bb [, 1], bb [, 2])
table (pip)
#> pip
#> 0 1 3
#> 2 83 705
Created on 2022-01-03 by the reprex package (v2.0.1.9000)
The point.in.polygon
function returns:
And so that result shows that 2 points on the boundary of Colchester lie geographically outside the bounding area. The overpass API correctly filters because it uses relation memberships based on OSM ID values. That works in this case even though it arguably shouldn't because the result is geographically invalid. The osmdata
approach remains geographically valid, and will return identical results to server-side filtering as long as those non-geographic results are also geographically correct. In this (hopefully outlier) case, they are not geographically correct, and so osmdata
does what I would consider to be the appropriate thing. Does that make sense?
Thanks for the analysis. I think you are right, I checked and Nominatim version of geometry for SCCOG is different from current OSM version. It seems Nominatim does not update geometry in cases where only nodes are moved, but way or relation is not touched. Might be worth adding a warning message to trim_osmdata() about possibility that results might not always be correct.
_Side note, map_to_area;
in OverPass does spatial subsetting of objects and does not need IDs. But it also includes objects that are only partially inside the area._
I notionally agree with you about adding a note, and tried doing that, but the function actually does return results that are strictly correct, and according to the above example, in fact more correct than the internal overpass trimming. Correctness can only be judged in geometric terms, and the function in current form is geometrically correct. Feel free to have a go yourself if you can think of a smart way to phrase such a note (as an extension of the Note in trim_osmdata
), but please ensure it clearly states that results will always be geometrically correct, even if they are ... unexpected, or whatever adjective might be used to describe cases like the above.
My attempt went something like this:
The operation is also a strictly geometric trimming, and may not necessarily yield expected results when used in combination with bounding polygons obtained from other sources, potentially including Nominatim.
That would only make sense if it were clear what expected results might look like, which it is not. Don't know how better to do that for now.
Right, this is a very special case of using trim_osmdata. But since it is suggested in the help page of the function, having a sentence or two might save users some time when scratching their heads why were some objects being removed.
What about this?
Caution is advised when using polygons obtained from Nominatim via getbb(..., format_out = "polygon"|"sf_polygon"). These shapes can be outdated and thus could cause that trimming operation might not give results expected based on the current state of OSM data.
That sounds great - could you please add that via a pull request? Thanks!
There might be possibly a bug in how
trim_osmdata()
filters the data. It seems to be removing some objects that should be in the final dataset.Created on 2021-11-23 by the reprex package (v2.0.1)