ropensci / osmextract

Download and import OpenStreetMap data from Geofabrik and other providers
https://docs.ropensci.org/osmextract
GNU General Public License v3.0
167 stars 12 forks source link

Mismatch between GEOMETRY in `osmextract::geofabrik_zones` and actual clipping boundary after update #270

Closed wlangera closed 1 year ago

wlangera commented 1 year ago

Hi

I noticed a small mistake in the osm layer for Belgium from geofabrik. The osm layer was not completely encompassing Belgium.

This meant that when I wanted to extract information from a region containing the mismatched area (using oe_get()), not the region of Belgium was downloaded but the entire European map. This map is very large and way too large for my intentions. Belgium is only a small country so it is a bit silly to have to download the entire map of Europe.

The people from geofabrik promptly fixed this, but oe_get() (via oe_match()) still wants to use the whole of Europe ... I discovered that this is because the GEOMETRY in osmextract::geofabrik_zones is not up to date with the actual clipping boundary of the osm layer on the geofabrik website. The clipping boundaries are available as country_name.kml files on https://download.geofabrik.de/europe/.

Therefore my question: Can you update osmextract::geofabrik_zones GEOMETRY such that it is up to date with the actual layers behind the download links?

Or even better: Would it be possible to automate this process? If there is an update from geofabrik (or any other provider), the GEOMETRY is updated automatically as well.

Kind regards, Ward

agila5 commented 1 year ago

Hi Ward! Thank you very much for opening this issue and for your extremely detailed explanation of the problem.

Can you update osmextract::geofabrik_zones GEOMETRY such that it is up to date with the actual layers behind the download links?

Yes! Unfortunately, this week I'm really busy but, hopefully, I can work on this issue (and, maybe, a new CRAN release) starting from the beginning of the next week (I'm not sure yet, sorry...)

Would it be possible to automate this process? If there is an update from geofabrik (or any other provider), the GEOMETRY is updated automatically as well.

TBH, I don't know... Technically that might be the best way to proceed, but I need to check whether it's really easy to automate this process or not.

Robinlovelace commented 1 year ago

Just to add: great that Geofabrik have updated their boundaries. I'm confident that this is an improvement. And agree with the .kml file should be possible to write a script that updates the geofabrik_zones.

wlangera commented 1 year ago

Thanks for your quick reply. Not sure if this will help the package, but I will share you my current workaround.

First a function that can check whether my sf object (called area) is within the boundary of the osm layer of Belgium. The argument update_osm_layer can be used to force download even if the osm files are already locally stored on your computer.

#' \code{check_osm_data} checks whether your polygon ( `sf` object) falls
#' within the boundaries of the osm layer (!only for Belgium here)
#' 
#' @param area An `sf` object
#' @param update_osm_layer Logical used to indicate whether the osm file
#' should be updated even if it has already been downloaded (TRUE/FALSE)

check_osm_data <- function(area, update_osm_layer) {
  # Download boundaries osm Belgium
  provider_file <- file.path(osmextract::oe_download_directory(),
                             "belgium.kml")
  # Download if file is not downloaded yet or if you want to force download
  if (!file.exists(provider_file) | update_osm_layer) {
    download.file("https://download.geofabrik.de/europe/belgium.kml",
                  provider_file)
  } else {
    message("The chosen file was already detected in the download directory. Skip downloading.")
  }
  provider_data <- st_read(provider_file, quiet = TRUE) %>%
    st_zm(drop = TRUE, what = "ZM")

  # Does the area fall within Belgium osm boundaries?
  matched_zones = provider_data[st_transform(area,
   crs = sf::st_crs(provider_data)), op = sf::st_contains]
  if (nrow(matched_zones) != 0L) {
    osmextract::oe_download("https://download.geofabrik.de/europe/belgium-latest.osm.pbf",
                            force_download = update_osm_layer)
  } else {
    stop("Belgium osm layer does not encompass given area!", call. = FALSE)
  }
}

Now to get the land uses for my area:

area <- ... # some sf object

check_osm_data(area, update_osm_layer = TRUE)

landuse_vectortranslate = c(
    "-t_srs", "EPSG:31370",
    "-select", "landuse",
    "-nlt", "PROMOTE_TO_MULTI"
  )

osmdata <- file.path(osmextract::oe_download_directory(),
                     "geofabrik_belgium-latest.osm.pbf")

area_landuse <- osmextract::oe_read(
    file_path = osmdata,
    layer = "multipolygons",
    download_directory = dirname(osmdata),
    vectortranslate_options = landuse_vectortranslate,
    boundary = area,
    boundary_type = "clipsrc")
agila5 commented 1 year ago

Thanks for your reply and the suggested code. My only worry is that the geofabrik server might comply if I try to download all .kml files at once (and my IP was already banned once from their website when I tried something similar...). I will double-check and update the package!

agila5 commented 1 year ago

Hi @wlangera, could you please check if the new geofabrik_zones solves your problem? You can install the development version of the package by running the following command: remotes::install_github("ropensci/osmextract", "update-geofabrik-zones")

wlangera commented 1 year ago

Hi

It did not work. Apparently there is still a tiny strip where the osm layer does not encompass Belgium. I will contact geofabric again and let you know.

Ward

agila5 commented 1 year ago

Hi @wlangera. Do you have any news?

wlangera commented 1 year ago

No, I will mail them again.

wlangera commented 1 year ago

Apparently they changed it, but they did not let me know. The perimeter for Belgium on the geofabrik website looks good now. So I think it should work if you update the osmextract::geofabrik_zones GEOMETRY.

agila5 commented 1 year ago

Ok, thanks! Could you please share the complete failing example so I can test it on my laptop?

wlangera commented 1 year ago

Here you go:

remotes::install_github("ropensci/osmextract", "update-geofabrik-zones")
#> Skipping install of 'osmextract' from a github remote, the SHA1 (4691ac69) has not changed since last install.
#>   Use `force = TRUE` to force installation

require(sf)
#> Loading required package: sf
#> Warning: package 'sf' was built under R version 4.2.2
#> Linking to GEOS 3.9.3, GDAL 3.5.2, PROJ 8.2.1; sf_use_s2() is TRUE

wfs_regions <- "https://eservices.minfin.fgov.be/arcgis/services/R2C/Regions/MapServer/WFSServer"
flanders <- read_sf(paste0("WFS:", wfs_regions),
              query = "select * from regions where NameDUT='Vlaams Gewest'") |>
  st_transform(crs = 4326) |>
  st_cast("GEOMETRYCOLLECTION")

osmextract::oe_match(flanders)
#> The input place was matched with Europe.
#> $url
#> [1] "https://download.geofabrik.de/europe-latest.osm.pbf"
#> 
#> $file_size
#> [1] 2.63e+10

Created on 2023-03-07 with reprex v2.0.2

agila5 commented 1 year ago

Hi @wlangera! The github version of the package now solves your problem.

remotes::install_github("ropensci/osmextract")
#> Skipping install of 'osmextract' from a github remote, the SHA1 (85d3116c) has not changed since last install.
#>   Use `force = TRUE` to force installation

library(sf)
#> Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 7.2.1; sf_use_s2() is TRUE

wfs_regions <- "https://eservices.minfin.fgov.be/arcgis/services/R2C/Regions/MapServer/WFSServer"
flanders <- read_sf(
  paste0("WFS:", wfs_regions),
  query = "select * from regions where NameDUT='Vlaams Gewest'"
) |>
  st_transform(crs = 4326) |>
  st_cast("GEOMETRYCOLLECTION")

osmextract::oe_match(flanders)
#> The input place was matched with Belgium.
#> $url
#> [1] "https://download.geofabrik.de/europe/belgium-latest.osm.pbf"
#> 
#> $file_size
#> [1] 5.11e+08

Created on 2023-03-07 with reprex v2.0.2

Thank you very much for raising this issue and for the extremely helpful feedback that you provided.

Would it be possible to automate this process? If there is an update from geofabrik (or any other provider), the GEOMETRY is updated automatically as well.

The process is not really "automatic", but I just added a new "bullet point" to remind me of updating the databases for all the providers before any CRAN release.

wlangera commented 1 year ago

Thank you very much.

Kind regards, Ward