ropensci / osmextract

Download and import OpenStreetMap data from Geofabrik and other providers
https://docs.ropensci.org/osmextract
GNU General Public License v3.0
170 stars 12 forks source link

[FEATURE] Download previous versions of the data from geofabrik #295

Open juanfonsecaLS1 opened 2 months ago

juanfonsecaLS1 commented 2 months ago

Is your feature request related to a problem? Please describe. Sometimes is useful to have access to previous versions of the OSM data. Currently osmextract gets the most recent version from different providers.

Describe the solution you'd like oe_get could use a parameter to indicate if the user wants a previous version of the data from geofabrik. Alternatively, a vignette can be created explaining how to do it.

Describe alternatives you've considered This is some code that works for getting an arbitrary version of the pbf file of Colombia

library(osmextract)
#> Data (c) OpenStreetMap contributors, ODbL 1.0. https://www.openstreetmap.org/copyright.
#> Check the package website, https://docs.ropensci.org/osmextract/, for more details.
library(rvest)

col_match <- oe_match("Colombia",provider = "geofabrik")
#> The input place was matched with: Colombia

u <- dirname(col_match$url)
f <- basename(col_match$url)

id_files <- gsub("latest\\.osm\\.pbf",replacement = "",f)

files_table <- (read_html(u) |> html_table())[[1]]

head(files_table)
#> # A tibble: 6 × 5
#>   ``    Name                                `Last modified`    Size  Description
#>   <lgl> <chr>                               <chr>              <chr> <lgl>      
#> 1 NA    ""                                  ""                 ""    NA         
#> 2 NA    "Parent Directory"                  ""                 "-"   NA         
#> 3 NA    "argentina-140101-free.shp.zip"     "2018-04-27 06:55" "99M" NA         
#> 4 NA    "argentina-140101-free.shp.zip.md5" "2018-05-03 17:18" "64"  NA         
#> 5 NA    "argentina-140101.osm.pbf"          "2014-01-01 23:35" "58M" NA         
#> 6 NA    "argentina-150101-free.shp.zip"     "2018-04-27 06:51" "133… NA

available_versions <- files_table$Name[grep(paste0(id_files,"\\d{6}\\.osm\\.pbf$"),files_table$Name)]

head(available_versions)
#> [1] "colombia-140101.osm.pbf" "colombia-150101.osm.pbf"
#> [3] "colombia-160101.osm.pbf" "colombia-170101.osm.pbf"
#> [5] "colombia-180101.osm.pbf" "colombia-190101.osm.pbf"

net_old <- do.call(oe_read,
                   list(file_path = paste0(u,"/",available_versions[10]))
                   )
#> The chosen file was already detected in the download directory. Skip downloading.
#> The corresponding gpkg file was already detected. Skip vectortranslate operations.
#> Reading layer `lines' from data source 
#>   `C:\Users\...\Documents\OSMEXT_downloads\geofabrik_colombia-230101.gpkg' 
#>   using driver `GPKG'
#> Simple feature collection with 1087521 features and 9 fields
#> Geometry type: LINESTRING
#> Dimension:     XY
#> Bounding box:  xmin: -85.94982 ymin: -4.503316 xmax: -66.57158 ymax: 26.00379
#> Geodetic CRS:  WGS 84

Created on 2024-09-05 with reprex v2.1.1

Additional context Add any other context or screenshots about the feature request here.

Robinlovelace commented 2 months ago

👍

agila5 commented 2 months ago

Thank you very much for your suggestion! I think it's a nice and reasonable idea, I'll do my best to implement it as soon as possible (maybe as an additional argument to oe_match & parents).