ropensci / osmextract

Download and import OpenStreetMap data from Geofabrik and other providers
https://docs.ropensci.org/osmextract
GNU General Public License v3.0
167 stars 12 forks source link

key and attribute #45

Closed mtennekes closed 3 years ago

mtennekes commented 4 years ago

Questions:

One suggestion would be to include the key into the attributes. For instance this does not work:

x = get_geofabrik("Ile-de-France", 
   layer = "multipolygons", 
   key = "waterway", value = "riverbank")

while this works:

x = get_geofabrik("Ile-de-France", 
   layer = "multipolygons", 
   key = "waterway", value = "riverbank", 
   attributes = "waterway")

On a side note, it doesn't seem to collect all riverbanks. I downloaded a shapefile from https://mapcruzin.com/free-france-arcgis-maps-shapefiles.htm which is also extracted from OSM, but which does contain all riverbanks.

Strangely, for key/attribute waterways it is the other way round:

Works:

x = get_geofabrik("Ile-de-France", 
    layer = "multipolygons", 
    key = "natural", value = "water")

Does not work:

x = get_geofabrik("Ile-de-France", 
    layer = "multipolygons", 
    key = "natural", value = "water", 
    attributes = "natural")

A select all query does not work. It doesn't throw an error, but returns an empty sf object:

x  = get_geofabrik("Ile-de-France", 
   layer = "multipolygons", 
   key = "waterway", value = "*", 
   attributes = "waterway")
agila5 commented 3 years ago

Hi @mtennekes and thanks for your questions. We completely rewrote the package in July and I think that we fixed most of your questions.

Regarding terminology, what is the difference between key and attribute? It is confusing.

OSM describe physical data using tags. A tag is a pair of two items: a key (e.g. highway) and a value (e.g. primary). The term attribute is used in the osmconf.ini file by GDAL to describe which tags should be explicitly read, while all other tags are appended in the other_tags field. See here for more details. For this reason, that argument is now called extra_tags.

For example:

# packages
suppressPackageStartupMessages(library(osmextract))

names(oe_get("Isle of Wight"))
#>  [1] "osm_id"     "name"       "highway"    "waterway"   "aerialway" 
#>  [6] "barrier"    "man_made"   "z_order"    "other_tags" "geometry"
names(oe_get("Isle of Wight", extra_tags = c("oneway", "maxspeed", "lit", "junction")))
#>  [1] "osm_id"     "name"       "highway"    "waterway"   "aerialway" 
#>  [6] "barrier"    "man_made"   "oneway"     "maxspeed"   "lit"       
#> [11] "junction"   "z_order"    "other_tags" "geometry"

Created on 2020-10-28 by the reprex package (v0.3.0)

I agree that the previous terminology was slightly unclear, but I think that it should be understandable now.

Why are not all available keys/attributes included in ini_new.ini? E.g. waterways is not included as an attribute of the layers multipolygons, while some waterways values are polygons (e.g. riverbanks)

I'm not sure that I understand your question, but we decided not to include all available keys since there are too many tags and I think that the best approach is to manually select which keys you are interested in. For example, the following code can be used to extract all keys stored in a .pbf file:

# packages
suppressPackageStartupMessages(library(osmextract))

# data
iow <- oe_get("Isle of Wight")
#> Warning: st_crs<- : replacing crs does not reproject data; use st_transform for
#> that
head(oe_get_keys(iow), 10)
#>  [1] "lit"         "lanes"       "surface"     "maxspeed"    "ref"        
#>  [6] "sidewalk"    "incline"     "oneway"      "prow_ref"    "designation"

Created on 2020-10-28 by the reprex package (v0.3.0)

Later, they can be added using the same syntax as before.

The argument attributes can be used in get_geofabrik. The default values, from make_additional_attributes(layer) seem arbitrary. Why this selection?

The default value now is NULL!

One suggestion would be to include the key into the attributes. For instance this does not work:

I think that this is not relevant anymore, and the same is for the next questions. I'm working now on testing the differences between osmextract and the data in mapcruzin.com

agila5 commented 3 years ago

Reprex for the "finding"

# packages
suppressPackageStartupMessages({
  library(osmextract)
  library(osmdata)
  library(sf)
  library(tmap)
})
tmap_mode("view")
#> tmap mode set to interactive viewing

osmextract approach

ile_multipoly_osmext <- oe_get(
  place = "Ile-de-France",
  quiet = FALSE,
  layer = "multipolygons",
  extra_tags = "waterway",
  query = "SELECT * from \"multipolygons\" WHERE waterway = \"riverbank\""
)
#> The input place was matched with: Ile-de-France
#> The chosen file was already detected in the download directory. Skip downloading.
#> The corresponding gpkg file was already detected. Skip vectortranslate operations.
#> Reading layer `multipolygons' from data source `C:\Users\Utente\Documents\osm_data\geofabrik_ile-de-france-latest.gpkg' using driver `GPKG'
#> Simple feature collection with 105 features and 26 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: 1.559279 ymin: 48.10158 xmax: 3.458908 ymax: 49.12312
#> geographic CRS: WGS 84

tm_shape(ile_multipoly_osmext) +
  tm_polygons(col = "waterway")

osmdata approach

ile_osmdata <- opq("Ile-de-France") %>%
  add_osm_feature(key = "waterway", value = "riverbank") %>%
  osmdata_sf()

Print result

ile_osmdata
#> Object of class 'osmdata' with:
#>                  $bbox : 48.1201456,1.4462445,49.241431,3.5592208
#>         $overpass_call : The call submitted to the overpass API
#>                  $meta : metadata including timestamp and version numbers
#>            $osm_points : 'sf' Simple Features Collection with 178818 points
#>             $osm_lines : 'sf' Simple Features Collection with 171 linestrings
#>          $osm_polygons : 'sf' Simple Features Collection with 2353 polygons
#>        $osm_multilines : NULL
#>     $osm_multipolygons : 'sf' Simple Features Collection with 121 multipolygons
ile_multipoly_osmdata <- unname_osmdata_sf(ile_osmdata)$osm_multipolygons

Plot result

tm_shape(ile_multipoly_osmdata) +
  tm_polygons(col = "waterway")

It looks more or less the same as before. The only differences are caused by riverbanks that are close to the boundary of Ile-de-france area and they are included in the bbox of osmdata but not in the polygon used by osmextract:

tm_shape(
  ile_multipoly_osmdata[
    ! ile_multipoly_osmdata$osm_id %in% ile_multipoly_osmext$osm_id,
    ]
) + 
  tm_polygons()

Use mapcruzin data. Download and unzip

download.file(
  "https://mapcruzin.com/download-shapefile/france-waterways-shape.zip",
  "france-waterways-shape.zip"
)
unzip("france-waterways-shape.zip", exdir = tempdir())

Read-in data

ile_multipoly_mapcruzin <- sf::st_read(tempdir(), stringsAsFactors = FALSE)
#> Reading layer `waterways' from data source `C:\Users\Utente\AppData\Local\Temp\Rtmpc1fKIQ' using driver `ESRI Shapefile'
#> Simple feature collection with 9963 features and 4 fields
#> geometry type:  LINESTRING
#> dimension:      XY
#> bbox:           xmin: -5.107661 ymin: 41.37122 xmax: 9.548755 ymax: 51.04924
#> geographic CRS: WGS 84

It's LINESTRING instead of POLYGON/MULTIPOLYGON. Filter Ile-de-France area

ile_multipoly_mapcruzin <- ile_multipoly_mapcruzin[
  st_geometry(geofabrik_zones)[geofabrik_zones$name == "Ile-de-France"] %>%
    st_set_crs(4326),
  ]
#> although coordinates are longitude/latitude, st_intersects assumes that they are planar

Check results. Tbh I don't know how to compare the two approaches:

table(ile_multipoly_mapcruzin$type)
#> 
#>      aqueduc     aqueduct        canal canal; river          dam         dock 
#>            6            6           67            1            2            2 
#>        drain     La Seine    lock_gate        river       stream         weir 
#>           34            1           14          160          355           14
mapview::mapview(ile_multipoly_mapcruzin, zcol = "type")

They look quite different, for example

tm_shape(ile_multipoly_mapcruzin) + 
  tm_lines() + 
  tm_view(set.view = c(2.31799, 48.91706, 11))

tm_shape(ile_multipoly_osmext) + 
  tm_polygons() + 
  tm_view(set.view = c(2.31799, 48.91706, 11))

Created on 2020-10-28 by the reprex package (v0.3.0)

Session info ``` r devtools::session_info() #> - Session info --------------------------------------------------------------- #> setting value #> version R version 3.6.3 (2020-02-29) #> os Windows 10 x64 #> system x86_64, mingw32 #> ui RTerm #> language (EN) #> collate Italian_Italy.1252 #> ctype Italian_Italy.1252 #> tz Europe/Berlin #> date 2020-10-28 #> #> - Packages ------------------------------------------------------------------- #> package * version date lib #> abind 1.4-5 2016-07-21 [1] #> assertthat 0.2.1 2019-03-21 [1] #> backports 1.1.10 2020-09-15 [1] #> base64enc 0.1-3 2015-07-28 [1] #> brew 1.0-6 2011-04-13 [1] #> callr 3.5.0 2020-10-08 [1] #> class 7.3-17 2020-04-26 [1] #> classInt 0.4-3 2020-04-07 [1] #> cli 2.0.2 2020-02-28 [1] #> codetools 0.2-16 2018-12-24 [2] #> colorspace 1.4-1 2019-03-18 [1] #> crayon 1.3.4 2017-09-16 [1] #> crosstalk 1.1.0.1 2020-03-13 [1] #> curl 4.3 2019-12-02 [1] #> DBI 1.1.0 2019-12-15 [1] #> desc 1.2.0 2018-05-01 [1] #> devtools 2.3.2 2020-09-18 [1] #> dichromat 2.0-0 2013-01-24 [1] #> digest 0.6.25 2020-02-23 [1] #> dplyr 1.0.2 2020-08-18 [1] #> e1071 1.7-4 2020-10-14 [1] #> ellipsis 0.3.1 2020-05-15 [1] #> evaluate 0.14 2019-05-28 [1] #> fansi 0.4.1 2020-01-08 [1] #> farver 2.0.3 2020-01-16 [1] #> fs 1.5.0 2020-07-31 [1] #> gdtools 0.2.2 2020-04-03 [1] #> generics 0.0.2 2018-11-29 [1] #> glue 1.4.2 2020-08-27 [1] #> highr 0.8 2019-03-20 [1] #> htmltools 0.5.0 2020-06-16 [1] #> htmlwidgets 1.5.2 2020-10-03 [1] #> httr 1.4.2 2020-07-20 [1] #> jsonlite 1.7.1 2020-09-07 [1] #> KernSmooth 2.23-17 2020-04-26 [1] #> knitr 1.30 2020-09-22 [1] #> lattice 0.20-41 2020-04-02 [1] #> leafem 0.1.3 2020-07-26 [1] #> leaflet 2.0.3 2019-11-16 [1] #> leaflet.providers 1.9.0 2019-11-09 [1] #> leafpop 0.0.6 2020-09-22 [1] #> leafsync 0.1.0 2019-03-05 [1] #> lifecycle 0.2.0 2020-03-06 [1] #> lubridate 1.7.9 2020-06-08 [1] #> lwgeom 0.2-5 2020-06-12 [1] #> magrittr 1.5 2014-11-22 [1] #> mapview 2.9.0 2020-08-11 [1] #> memoise 1.1.0 2017-04-21 [1] #> mime 0.9 2020-02-04 [1] #> munsell 0.5.0 2018-06-12 [1] #> osmdata * 0.1.3.020 2020-10-28 [1] #> osmextract * 0.1.0 2020-10-24 [1] #> pillar 1.4.6 2020-07-10 [1] #> pkgbuild 1.1.0 2020-07-13 [1] #> pkgconfig 2.0.3 2019-09-22 [1] #> pkgload 1.1.0 2020-05-29 [1] #> png 0.1-7 2013-12-03 [1] #> prettyunits 1.1.1 2020-01-24 [1] #> processx 3.4.4 2020-09-03 [1] #> ps 1.3.4 2020-08-11 [1] #> purrr 0.3.4 2020-04-17 [1] #> R6 2.4.1 2019-11-12 [1] #> raster 3.3-13 2020-07-17 [1] #> RColorBrewer 1.1-2 2014-12-07 [1] #> Rcpp 1.0.5 2020-07-06 [1] #> remotes 2.2.0 2020-07-21 [1] #> rlang 0.4.7 2020-07-09 [1] #> rmarkdown 2.4 2020-09-30 [1] #> rprojroot 1.3-2 2018-01-03 [1] #> rvest 0.3.6 2020-07-25 [1] #> satellite 1.0.2 2019-12-09 [1] #> scales 1.1.1 2020-05-11 [1] #> sessioninfo 1.1.1 2018-11-05 [1] #> sf * 0.9-6 2020-09-13 [1] #> sp 1.4-2 2020-05-20 [1] #> stars 0.4-4 2020-08-19 [1] #> stringi 1.4.6 2020-02-17 [1] #> stringr 1.4.0 2019-02-10 [1] #> svglite 1.2.3.2 2020-07-07 [1] #> systemfonts 0.3.2 2020-09-29 [1] #> testthat 2.3.2 2020-03-02 [1] #> tibble 3.0.3.9000 2020-07-12 [1] #> tidyselect 1.1.0 2020-05-11 [1] #> tmap * 3.2 2020-09-15 [1] #> tmaptools 3.1 2020-07-12 [1] #> units 0.6-7 2020-06-13 [1] #> usethis 1.6.3 2020-09-17 [1] #> uuid 0.1-4 2020-02-26 [1] #> vctrs 0.3.4 2020-08-29 [1] #> viridisLite 0.3.0 2018-02-01 [1] #> webshot 0.5.2 2019-11-22 [1] #> withr 2.3.0 2020-09-22 [1] #> xfun 0.18 2020-09-29 [1] #> XML 3.99-0.3 2020-01-20 [1] #> xml2 1.3.2 2020-04-23 [1] #> yaml 2.2.1 2020-02-01 [1] #> source #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.3) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.3) #> CRAN (R 3.6.3) #> CRAN (R 3.6.3) #> CRAN (R 3.6.3) #> CRAN (R 3.6.3) #> CRAN (R 3.6.3) #> CRAN (R 3.6.0) #> CRAN (R 3.6.3) #> CRAN (R 3.6.1) #> CRAN (R 3.6.3) #> CRAN (R 3.6.0) #> CRAN (R 3.6.3) #> CRAN (R 3.6.0) #> CRAN (R 3.6.3) #> CRAN (R 3.6.3) #> CRAN (R 3.6.3) #> CRAN (R 3.6.3) #> CRAN (R 3.6.0) #> CRAN (R 3.6.2) #> CRAN (R 3.6.2) #> CRAN (R 3.6.3) #> CRAN (R 3.6.3) #> CRAN (R 3.6.0) #> CRAN (R 3.6.3) #> CRAN (R 3.6.0) #> CRAN (R 3.6.3) #> CRAN (R 3.6.3) #> CRAN (R 3.6.3) #> CRAN (R 3.6.3) #> CRAN (R 3.6.3) #> CRAN (R 3.6.3) #> CRAN (R 3.6.3) #> CRAN (R 3.6.3) #> CRAN (R 3.6.3) #> CRAN (R 3.6.1) #> CRAN (R 3.6.3) #> CRAN (R 3.6.1) #> CRAN (R 3.6.2) #> CRAN (R 3.6.3) #> CRAN (R 3.6.3) #> CRAN (R 3.6.3) #> CRAN (R 3.6.3) #> CRAN (R 3.6.0) #> CRAN (R 3.6.2) #> CRAN (R 3.6.0) #> Github (ropensci/osmdata@fc2332e) #> local #> CRAN (R 3.6.3) #> CRAN (R 3.6.3) #> CRAN (R 3.6.1) #> CRAN (R 3.6.3) #> CRAN (R 3.6.0) #> CRAN (R 3.6.2) #> CRAN (R 3.6.3) #> CRAN (R 3.6.3) #> CRAN (R 3.6.3) #> CRAN (R 3.6.1) #> CRAN (R 3.6.3) #> CRAN (R 3.6.0) #> CRAN (R 3.6.3) #> CRAN (R 3.6.3) #> CRAN (R 3.6.3) #> CRAN (R 3.6.3) #> CRAN (R 3.6.0) #> CRAN (R 3.6.3) #> CRAN (R 3.6.2) #> CRAN (R 3.6.3) #> CRAN (R 3.6.0) #> CRAN (R 3.6.3) #> CRAN (R 3.6.3) #> Github (r-spatial/stars@b7b54c8) #> CRAN (R 3.6.2) #> CRAN (R 3.6.0) #> CRAN (R 3.6.3) #> CRAN (R 3.6.3) #> CRAN (R 3.6.3) #> Github (tidyverse/tibble@a57ad4a) #> CRAN (R 3.6.3) #> CRAN (R 3.6.3) #> Github (mtennekes/tmaptools@947f3bd) #> CRAN (R 3.6.3) #> CRAN (R 3.6.3) #> CRAN (R 3.6.3) #> CRAN (R 3.6.3) #> CRAN (R 3.6.0) #> CRAN (R 3.6.1) #> CRAN (R 3.6.3) #> CRAN (R 3.6.3) #> CRAN (R 3.6.2) #> CRAN (R 3.6.3) #> CRAN (R 3.6.2) #> #> [1] C:/Users/Utente/Documents/R/win-library/3.6 #> [2] C:/Program Files/R/R-3.6.3/library ```

TLDR: The results obtained with osmextract and osmdata are more or less the same so I think that we are getting the right data. I don't know how to compare with mapcruzin data.

Robinlovelace commented 3 years ago

I think this issue can now be closed.