Closed agila5 closed 4 years ago
Thanks @agila5 - good sleuthing. I'll check it out asap
I agree with you that there's something strange going on there, but I'm not so sure it's actually a bug. The multipolygon extraction in get1bdymultipolygon
actually seems right, because it does return the first element on the nominatim data. This is what the data on the server look like:
And here's a reprex of what current code does:
library(magrittr)
q_url <- paste0 ("https://nominatim.openstreetmap.org/?q=",
"Isle%20of%20Wight&featuretype=settlement&",
"polygon_text=1&format=json&limit=10")
obj <- httr::RETRY("POST", q_url, times = 10) %>%
httr::content(as = "text", encoding = "UTF-8", type = "application/xml") %>%
jsonlite::fromJSON()
bn <- as.numeric(obj$boundingbox[[1]])
bb_mat <- matrix(c(bn[3:4], bn[1:2]), nrow = 2, byrow = TRUE)
dimnames(bb_mat) <- list(c("x", "y"), c("min", "max"))
indx_multi <- which (grepl ("MULTIPOLYGON", obj$geotext))
gt_p <- gt_mp <- NULL
gt_mp <- obj$geotext [indx_multi] %>%
gsub ("MULTIPOLYGON\\(\\(\\(", "", .) %>%
gsub ("\\)\\)\\)", "", .) %>%
strsplit (split = ',')
indx_na <- rev (which (is.na (gt_mp)))
for (i in indx_na)
gt_mp [[i]] <- NULL
# gt_mp is then a list of 1 item
# gt_mp <- lapply (gt_mp, function (i) get1bdymultipoly (i))
p <- gt_mp [[1]]
p <- p [1:min (which (grepl (")", p)))]
p <- vapply (p, function (i) gsub (")", "", i),
character (1), USE.NAMES = FALSE)
t (cbind (vapply (p, function (i)
as.numeric (strsplit (i, split = ' ') [[1]]),
numeric (2), USE.NAMES = FALSE)))
#> [,1] [,2]
#> [1,] -1.591801 50.66224
#> [2,] -1.591789 50.66221
#> [3,] -1.591729 50.66221
#> [4,] -1.591604 50.66222
#> [5,] -1.591516 50.66223
#> [6,] -1.591391 50.66224
#> [7,] -1.591220 50.66228
#> [8,] -1.591024 50.66230
#> [9,] -1.590832 50.66233
#> [10,] -1.590618 50.66236
#> [11,] -1.590445 50.66241
#> [12,] -1.590364 50.66243
#> [13,] -1.590480 50.66247
#> [14,] -1.590564 50.66252
#> [15,] -1.590731 50.66250
#> [16,] -1.590939 50.66246
#> [17,] -1.591193 50.66240
#> [18,] -1.591481 50.66235
#> [19,] -1.591564 50.66233
#> [20,] -1.591630 50.66229
#> [21,] -1.591705 50.66228
#> [22,] -1.591793 50.66227
#> [23,] -1.591801 50.66224
Created on 2019-12-04 by the reprex package (v0.3.0)
So it is correctly returning the first element of the multipolygon, which has 23 coordinates. Maybe there's something wrong with the actual multipolygon structure as specified in nominatim? Do you have any further insights, or information about what might actually be wrong here?
(And the second issue regarding seq
:
https://github.com/ropensci/osmdata/blob/9560e249a1b4f9edb3a5bbd4b49e082fbad0fdd8/R/getbb.R#L239
is okay, since obj
is always a list with at least 1 item, and seq
will always just enumerate the length of that list. Could arguably be safer to use seq_along
, but not necessary there.)
So the one that you want here is [[3]], which your result above indicated:
# which(grepl(")", p))
# [1] 23 42 8373 8398
meaning item sizes of:
Easy there to think that [[3]] is the one you want, but that's not a reliable diagnostic. Nor is it appropriate to automatically select the largest polygon, because for administrative boundaries in particular that can prevent selection of smaller scale boundaries by always preferring equivalent, larger scale ones... That said, this should then work, but clearly does not at all:
bb <- getbb("Isle of Wight", format_out = "sf_polygon", poly_num = 1)
yet clearly fails:
... so there is something going on. Moreoever, that should work with poly_num = 3
, but fails with subscript out of bounds
Hi @mpadge, thanks for your analysis. I'm checking this problem again and I still don't get one point. What's your expected output for str(getbb("Isle of Wight", format_out = "polygon"))
? Because when I look at the URL of that query here I see:
so I would expect to obtain 1) a list of 6 polygons or 2) a list of 2 elements where the first/second element is a list of 6 polygons. Does it make sense? At the moment I get the same output as in the original comment, i.e.
str(getbb("Isle of Wight", format_out = "polygon"))
#> List of 2
#> $ : num [1:1785, 1:2] -76.9 -76.9 -76.9 -76.9 -76.9 ...
#> $ : num [1:23, 1:2] -1.59 -1.59 -1.59 -1.59 -1.59 ...
Created on 2019-12-04 by the reprex package (v0.3.0)
which means that I get the fully polygon object and just the first polygon of the multipolygons.
(And the second issue regarding seq is okay, since obj is always a list with at least 1 item, and seq will always just enumerate the length of that list. Could arguably be safer to use seq_along, but not necessary there.
I think that you are missing my point. I was not pointing out the difference between seq and seq_along. I was saying that here
https://github.com/ropensci/osmdata/blob/ec45cb1608921d0eb3016c4f73513f5614d67ee0/R/getbb.R#L239
you are testing something wrt indx_multi
, which was defined as
https://github.com/ropensci/osmdata/blob/ec45cb1608921d0eb3016c4f73513f5614d67ee0/R/getbb.R#L223
This means that indx_multi
is a numeric vector of indexes of which ROWS of obj$geotext
contains the string MULTIPOLYGON, right?. This implies that the other rows are not MULTIPOLYGONS, right? Is so, then IMO the following check
https://github.com/ropensci/osmdata/blob/ec45cb1608921d0eb3016c4f73513f5614d67ee0/R/getbb.R#L239
is wrong since you are testing a condition wrt to the COLUMNS of obj
. For example, I think you are doing something like this:
indx_setosa <- which(grepl("setosa", iris$Species))
indx <- which(!(seq(iris) %in% indx_setosa))
indx
#> integer(0)
Created on 2019-12-04 by the reprex package (v0.3.0)
while what you intended is
indx_setosa <- which(grepl("setosa", iris$Species))
indx <- which(!(seq(nrow(iris)) %in% indx_setosa))
indx
#> [1] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
#> [19] 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86
#> [37] 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104
#> [55] 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122
#> [73] 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140
#> [91] 141 142 143 144 145 146 147 148 149 150
Created on 2019-12-04 by the reprex package (v0.3.0)
I hope it makes sense. If I'm wrong just ignore this issue and excuse me for messing with two separate questions in just one github issue.
yes, @agila5, you're right that this is indeed a bug in getbb
, which arises because the function was coded with the -- incorrect -- assumption that multipolygon boundaries would always be constructed as outer/inner
polygons, but of course they can simply be lists of spatially disjoint polygons, as is the case with the Isle of Wight. So yes, getbb()
should in this case return all of the polygons. The above commit now gives:
library(osmdata)
#> Data (c) OpenStreetMap contributors, ODbL 1.0. https://www.openstreetmap.org/copyright
bb <- getbb ("Isle of Wight", format_out = "polygon")
str (bb)
#> List of 2
#> $ : num [1:1785, 1:2] -76.9 -76.9 -76.9 -76.9 -76.9 ...
#> $ :List of 5
#> ..$ : num [1:23, 1:2] -1.59 -1.59 -1.59 -1.59 -1.59 ...
#> ..$ : num [1:19, 1:2] -1.59 -1.59 -1.59 -1.59 -1.59 ...
#> ..$ : num [1:8331, 1:2] -1.59 -1.59 -1.59 -1.59 -1.59 ...
#> ..$ : num [1:25, 1:2] -1.1 -1.1 -1.1 -1.1 -1.1 ...
#> ..$ : num [1:50, 1:2] -1.09 -1.09 -1.09 -1.09 -1.09 ...
Created on 2020-01-05 by the reprex package (v0.3.0)
This may present problems when automatically piping outputs to subsequent functions, so it may be better to unlist
the nested lists, but then again the result directly reflects the nominatim
structure, and unlisting would destroy that. So i suspect it's better to leave subsequent selection of a desired bounding polygon to the user, but please feel free to suggest or discuss any alternative.
I'll now re-open the issue and invite you to submit a PR to fix your second fault, which you did indeed identify. Changing that one line from seq(obj)
to seq(nrow(obj))
is exactly the fix needed. Once you've done that, then I'll also add you to the official list of package contributors, also as a reflection of the contribution all of your insights have been here. Thanks in advance!
This may present problems when automatically piping outputs to subsequent functions, so it may be better to unlist the nested lists, but then again the result directly reflects the nominatim structure, and unlisting would destroy that. So i suspect it's better to leave subsequent selection of a desired bounding polygon to the user, but please feel free to suggest or discuss any alternative.
I'm not sure but I think that the current API is fine. I don't know about the "typical" usage of getbb function but I've never had any problem with the output format, just with the results.
On a related topic, I think that it should be specified in the help page that the poly_num
parameter works only on sf_output
format and it cannot be used to select, for example, the third polygon from the second list of polygons. For example
# package
library(osmdata)
#> Data (c) OpenStreetMap contributors, ODbL 1.0. https://www.openstreetmap.org/copyright
str(getbb("isle of wight", format_out = "polygon", poly_num = c(2, 3)))
#> List of 2
#> $ : num [1:1785, 1:2] -76.9 -76.9 -76.9 -76.9 -76.9 ...
#> $ :List of 5
#> ..$ : num [1:23, 1:2] -1.59 -1.59 -1.59 -1.59 -1.59 ...
#> ..$ : num [1:19, 1:2] -1.59 -1.59 -1.59 -1.59 -1.59 ...
#> ..$ : num [1:8331, 1:2] -1.59 -1.59 -1.59 -1.59 -1.59 ...
#> ..$ : num [1:25, 1:2] -1.1 -1.1 -1.1 -1.1 -1.1 ...
#> ..$ : num [1:50, 1:2] -1.09 -1.09 -1.09 -1.09 -1.09 ...
Created on 2020-01-05 by the reprex package (v0.3.0)
I'm not sure if that's intended (but I think so) and I can add one sentence to the help page in the PR if you want.
We actually should just have the sf
version also return the full result, including all polygons and multipolygons. Reopening until that's done
@agila5 That commit gives the following output:
setwd ("/data/mega/code/repos/ropensci/osmdata")
devtools::load_all (".", export_all = TRUE)
#> Loading osmdata
#> Data (c) OpenStreetMap contributors, ODbL 1.0. https://www.openstreetmap.org/copyright
library (sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.2, PROJ 6.2.1
bb <- getbb ("Isle of Wight", format_out = "polygon")
str (bb)
#> List of 2
#> $ : num [1:1785, 1:2] -76.9 -76.9 -76.9 -76.9 -76.9 ...
#> $ :List of 5
#> ..$ : num [1:23, 1:2] -1.59 -1.59 -1.59 -1.59 -1.59 ...
#> ..$ : num [1:19, 1:2] -1.59 -1.59 -1.59 -1.59 -1.59 ...
#> ..$ : num [1:8331, 1:2] -1.59 -1.59 -1.59 -1.59 -1.59 ...
#> ..$ : num [1:25, 1:2] -1.1 -1.1 -1.1 -1.1 -1.1 ...
#> ..$ : num [1:50, 1:2] -1.09 -1.09 -1.09 -1.09 -1.09 ...
bbsf <- getbb ("Isle of Wight", format_out = "sf_polygon")
bbsf
#> $polygon
#> Simple feature collection with 1 feature and 0 fields
#> geometry type: POLYGON
#> dimension: XY
#> bbox: xmin: -76.93027 ymin: 36.64411 xmax: -76.4453 ymax: 37.15405
#> epsg (SRID): 4326
#> proj4string: +proj=longlat +datum=WGS84 +no_defs
#> geometry
#> 1 POLYGON ((-76.93027 36.7048...
#>
#> $multipolygon
#> Simple feature collection with 1 feature and 0 fields
#> geometry type: MULTIPOLYGON
#> dimension: XY
#> bbox: xmin: -1.591801 ymin: 50.57468 xmax: -1.062719 ymax: 50.76756
#> epsg (SRID): 4326
#> proj4string: +proj=longlat +datum=WGS84 +no_defs
#> geometry
#> 1 MULTIPOLYGON (((-1.591801 5...
str (bbsf)
#> List of 2
#> $ polygon :Classes 'sf' and 'data.frame': 1 obs. of 1 variable:
#> ..$ geometry:sfc_POLYGON of length 1; first list element: List of 1
#> .. ..$ : num [1:1785, 1:2] -76.9 -76.9 -76.9 -76.9 -76.9 ...
#> .. ..- attr(*, "class")= chr [1:3] "XY" "POLYGON" "sfg"
#> ..- attr(*, "sf_column")= chr "geometry"
#> ..- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..:
#> .. ..- attr(*, "names")= chr(0)
#> $ multipolygon:Classes 'sf' and 'data.frame': 1 obs. of 1 variable:
#> ..$ geometry:sfc_MULTIPOLYGON of length 1; first list element: List of 1
#> .. ..$ :List of 5
#> .. .. ..$ : num [1:23, 1:2] -1.59 -1.59 -1.59 -1.59 -1.59 ...
#> .. .. ..$ : num [1:19, 1:2] -1.59 -1.59 -1.59 -1.59 -1.59 ...
#> .. .. ..$ : num [1:8331, 1:2] -1.59 -1.59 -1.59 -1.59 -1.59 ...
#> .. .. ..$ : num [1:25, 1:2] -1.1 -1.1 -1.1 -1.1 -1.1 ...
#> .. .. ..$ : num [1:50, 1:2] -1.09 -1.09 -1.09 -1.09 -1.09 ...
#> .. ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
#> ..- attr(*, "sf_column")= chr "geometry"
#> ..- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..:
#> .. ..- attr(*, "names")= chr(0)
Created on 2020-01-07 by the reprex package (v0.3.0)
so all polygon formats return all nominatim polygon and multipolygon objects - and the sf
bounding boxes immediately show that the polygon
object is not the UK Isle of Wight, while the multipolygon
is the desired result, and contains all islands.
Ok, great and thanks! My only note is on the poly_num
parameter because I think it's useless now. Do you plan to reintroduce it adding the possibility to automatically select one of the polygons returned by the query? If not, I think you should just remove that and update the docs.
Hi @mpadge. I'm sorry to bother you again but I just need to ask another question.
When I use the getbb
function (as in the next reprex) with format_out = "polygon"
I see a list with two polygons and I'm interested only in the first one but the output of getbb
function with format_out = "sf_polygon"
is one sf object with two rows (i.e. the two polygons). Obviously is trivial to filter only the first polygon but I just wanted to ask if that's intended behaviour or not.
# packages
library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.2.3, PROJ 4.9.3
library(osmdata)
#> Data (c) OpenStreetMap contributors, ODbL 1.0. https://www.openstreetmap.org/copyright
getbb("Lecco, Italy", format_out = "data.frame")
#> place_id
#> 1 198659884
#> 2 198006636
#> 3 254490413
#> licence
#> 1 Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright
#> 2 Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright
#> 3 Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright
#> osm_type osm_id boundingbox lat
#> 1 relation 45639 45.6493752, 46.151716, 9.2437341, 9.5411056 45.9005485
#> 2 relation 46229 45.8121895, 45.9061432, 9.3667651, 9.4848958 45.8553764
#> 3 node 62505597 45.6953764, 46.0153764, 9.229605, 9.549605 45.8553764
#> lon display_name class type
#> 1 9.41202482143963 Lecco, Lombardia, Italia boundary administrative
#> 2 9.389605 Lecco, Lombardia, 23900, Italia place city
#> 3 9.389605 Lecco, Lombardia, 23900, Italia place city
#> importance
#> 1 0.7340101
#> 2 0.6544993
#> 3 0.6544993
#> icon
#> 1 https://nominatim.openstreetmap.org/images/mapicons/poi_boundary_administrative.p.20.png
#> 2 https://nominatim.openstreetmap.org/images/mapicons/poi_place_city.p.20.png
#> 3 https://nominatim.openstreetmap.org/images/mapicons/poi_place_city.p.20.png
str(getbb("Lecco, Italy", format_out = "polygon"))
#> List of 2
#> $ : num [1:7763, 1:2] 9.24 9.25 9.25 9.25 9.25 ...
#> $ : num [1:820, 1:2] 9.37 9.37 9.37 9.37 9.38 ...
getbb("Lecco, Italy", format_out = "sf_polygon")
#> Simple feature collection with 2 features and 0 fields
#> geometry type: POLYGON
#> dimension: XY
#> bbox: xmin: 9.243734 ymin: 45.64938 xmax: 9.541106 ymax: 46.15172
#> epsg (SRID): 4326
#> proj4string: +proj=longlat +datum=WGS84 +no_defs
#> geometry
#> 1 POLYGON ((9.243734 45.76648...
#> 2 POLYGON ((9.366765 45.87497...
Created on 2020-01-08 by the reprex package (v0.3.0)
Yeah, that is intentional now, and I think this is okay behaviour, but am totally open to suggestions for alternative approaches. The poly_num
argument has gone, and there is no longer any default selection of first polygon in cases returning multiple polygons. Default behaviour in all cases in now to return everything. Does that make sense?
Hi! I was working on creating an example of transforming a
sf
object with LINESTRING geometry into alinnet
object and I think I found a bug ingetbb
. This is the problem.There are three matches with my query and I'm interested only in the first one. So I extracted the polygon format and I found something weird.
There are only two matches (but that's not a problem since the third row of the data.frame output was just a point so it makes sense to exclude that point from the result). The problem is that the bb associated with the Isle of Wight located in the South East of England is the second element of the list and the result is clearly wrong since there are only 23 coordinates. For example, this is a map of that result:
It's not clear from the image that that POLYGON is in the Isle of Wight but you can recognize it if you run the code locally.
Then I checked the URL associated with that query, i.e. here and I think you can see that the geotext element of the first MULTIPOLYGON object is quite different from the result I get from getbb
IMO the problem is here https://github.com/ropensci/osmdata/blob/9560e249a1b4f9edb3a5bbd4b49e082fbad0fdd8/R/getbb.R#L257 which just calls https://github.com/ropensci/osmdata/blob/9560e249a1b4f9edb3a5bbd4b49e082fbad0fdd8/R/getbb.R#L343-L346
If I run that function locally in debug mode I get
which explains why the second element for the list has 23 rows but I think that's not the intended result. Unfortunately I couldn't think of any easy fix of that problem.
Moreover when I run the getbb function locally in debug mode I found this line https://github.com/ropensci/osmdata/blob/9560e249a1b4f9edb3a5bbd4b49e082fbad0fdd8/R/getbb.R#L239 which IMO is also wrong since
seq(obj)
returns a sequence with length equals to the number of columns in a dataframe (i.e.)while I think in that line of code you are looking for matches in the rows of the obj object.
I hope it's clear.
Created on 2019-12-01 by the reprex package (v0.3.0)