Closed Robinlovelace closed 3 years ago
Draft of submission:
Submitting Author: Name (@agila5)
Repository: ITSLeeds/osmextract
Version submitted:
Editor: TBD
Reviewer 1: TBD
Reviewer 2: TBD
Archive: TBD
Version accepted: TBD
Package: osmextract
Type: Package
Title: Download and Read OpenStreetMap Data Extracts
Version: 0.1.0
Authors@R: c(
person("Andrea", "Gilardi", role=c("aut", "cre"),
comment = c(ORCID = "0000-0002-9424-7439")),
person("Robin", "Lovelace", email = "rob00x@gmail.com", role = c("aut"),
comment = c(ORCID = "0000-0001-5679-6536")),
person("Barry", "Rowlingson", role=c("ctb"),
comment = c(ORCID = "0000-0002-8586-6625"))
)
Description: This package is used to download, convert and read Open Street
Map data extracts downloaded from several providers.
License: GPL-3
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.1.1
Roxygen: list(markdown = TRUE)
URL: https://github.com/itsleeds/osmextract
BugReports: https://github.com/itsleeds/osmextract/issues
Depends: R (>= 3.5.0)
Imports:
sf (>= 0.8.1),
utils,
tools
Suggests:
testthat,
knitr,
rmarkdown
VignetteBuilder: knitr
Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
Explain how and why the package falls under these categories (briefly, 1-2 sentences):
osmextract
is for accessing and downloading OpenStreetMap data from a variety (current 2) of providers. OpenStreetMap has numerous scientific applications, including in transport, ecological analysis of human impacts on the environment, spatial statistics and land use change and the package provides a way of extracting the unstructure compressed .pbf files into modern geographic data structures (sf
data frames).
The target audience is researchers, practitioners and citizens who want to use OpenStreetMap data for scientific research, especially people who require large datasets (from city to continental scale) for their work.
osmdata
provides an R interface
to the Overpass API,
which is ideal for downloading small OSM datasets. However, the API is
rate limited, making it hard to download large datasets.
We believe the package complies with ethical guidelines in the Internet Research: Ethical Guidelines 3.0 document. It makes it easier for researchers to access and make use of data that is already in the public domain, under the conditions of the adhering to the conditions of the OdBL.
Confirm each of the following by checking the box.
This package:
[x] Do you intend for this package to go on CRAN?
[ ] Do you intend for this package to go on Bioconductor?
[ ] Do you wish to automatically submit to the Journal of Open Source Software? If so:
We intend to submit the paper to another journal at present
Reopen to discuss the review process in https://github.com/ropensci/software-review/issues/395
I tried to reproduce the errors found in the editor's check using my laptop (running Windows) and my server (running ubuntu 16.04) but with no success :cry:
IMO, the warning messages are not that important since they are simply caused by different versions of GDAL that do not support WKT (like on my server), while provider's data in osmextract
was built using WKT. The problem (and the solution) are described here: https://github.com/r-spatial/sf/issues/1419 I didn't worry about those warning messages since Edzer said to ignore them. I added a note in the README:
but maybe we should also add another note in the package-startup-message.
Reprex on the installation:
# package is missing
library(osmextract)
#> Error in library(osmextract): there is no package called 'osmextract'
# download pkg from github
temp_zip <- tempfile(fileext = ".zip")
download.file(
url = "https://github.com/ITSLeeds/osmextract/archive/master.zip",
destfile = temp_zip
)
unzip(temp_zip, exdir = tempdir())
# install
devtools::install(paste0(tempdir(), "/osmextract-master"), dependencies = TRUE, build_vignettes = TRUE)
#>
#> checking for file ‘/tmp/RtmpL9uJxp/osmextract-master/DESCRIPTION’ ... ✓ checking for file ‘/tmp/RtmpL9uJxp/osmextract-master/DESCRIPTION’
#> ─ preparing ‘osmextract’:
#> checking DESCRIPTION meta-information ... ✓ checking DESCRIPTION meta-information
#> ─ installing the package to build vignettes
#> creating vignettes ... ✓ creating vignettes (13.7s)
#> ─ checking for LF line-endings in source and make files and shell scripts
#> ─ checking for empty or unneeded directories
#> ─ looking to see if a ‘data/datalist’ file should be added
#> ─ building ‘osmextract_0.1.0.tar.gz’
#>
#> Running /usr/lib/R/bin/R CMD INSTALL /tmp/RtmpL9uJxp/osmextract_0.1.0.tar.gz \
#> --install-tests
#> * installing to library ‘/home/dsuser/R/x86_64-pc-linux-gnu-library/3.6’
#> * installing *source* package ‘osmextract’ ...
#> ** using staged installation
#> ** R
#> ** data
#> *** moving datasets to lazyload DB
#> ** inst
#> ** tests
#> ** byte-compile and prepare package for lazy loading
#> ** help
#> *** installing help indices
#> *** copying figures
#> ** building package indices
#> ** installing vignettes
#> ** testing if installed package can be loaded from temporary location
#> ** testing if installed package can be loaded from final location
#> ** testing if installed package keeps a record of temporary installation path
#> * DONE (osmextract)
# tests
devtools::test(paste0(tempdir(), "/osmextract-master"))
#> Loading osmextract
#> Data (c) OpenStreetMap contributors, ODbL 1.0. https://www.openstreetmap.org/copyright.
#> Any product made from OpenStreetMap must cite OSM as the data source.
#> Geofabrik data are taken from https://download.geofabrik.de/
#> For usage details of bbbike data see https://download.bbbike.org/osm/
#> OpenStreetMap_fr data are taken from http://download.openstreetmap.fr/
#> Testing osmextract
#> ✓ | OK F W S | Context
#> ⠏ | 0 | download⠋ | 0 1 | download⠙ | 1 1 | download⠴ | 5 1 | download✓ | 6 1 | download [0.7 s]
#> ────────────────────────────────────────────────────────────────────────────────────────────────────
#> test-download.R:3: warning: oe_download: simplest examples work
#> st_crs<- : replacing crs does not reproject data; use st_transform for that
#> ────────────────────────────────────────────────────────────────────────────────────────────────────
#> ⠏ | 0 | find⠋ | 0 1 | find⠼ | 1 4 | find✓ | 3 4 | find [0.4 s]
#> ────────────────────────────────────────────────────────────────────────────────────────────────────
#> test-find.R:2: warning: oe_find: simplest example works
#> st_crs<- : replacing crs does not reproject data; use st_transform for that
#>
#> test-find.R:6: warning: oe_find: simplest example works
#> st_crs<- : replacing crs does not reproject data; use st_transform for that
#>
#> test-find.R:6: warning: oe_find: simplest example works
#> st_crs<- : replacing crs does not reproject data; use st_transform for that
#>
#> test-find.R:6: warning: oe_find: simplest example works
#> st_crs<- : replacing crs does not reproject data; use st_transform for that
#> ────────────────────────────────────────────────────────────────────────────────────────────────────
#> ⠏ | 0 | get✓ | 1 1 | get
#> ────────────────────────────────────────────────────────────────────────────────────────────────────
#> test-get.R:2: warning: oe_get: simplest examples work
#> st_crs<- : replacing crs does not reproject data; use st_transform for that
#> ────────────────────────────────────────────────────────────────────────────────────────────────────
#> ⠏ | 0 | match⠏ | 4 6 | match⠙ | 8 14 | match⠋ | 12 19 | match⠏ | 24 26 | match✓ | 26 28 | match [0.4 s]
#> ────────────────────────────────────────────────────────────────────────────────────────────────────
#> test-match.R:2: warning: oe_match: simplest examples work
#> st_crs<- : replacing crs does not reproject data; use st_transform for that
#>
#> test-match.R:2: warning: oe_match: simplest examples work
#> st_crs<- : replacing crs does not reproject data; use st_transform for that
#>
#> test-match.R:3: warning: oe_match: simplest examples work
#> st_crs<- : replacing crs does not reproject data; use st_transform for that
#>
#> test-match.R:3: warning: oe_match: simplest examples work
#> st_crs<- : replacing crs does not reproject data; use st_transform for that
#>
#> test-match.R:14: warning: oe_match: sfc_POINT objects
#> st_crs<- : replacing crs does not reproject data; use st_transform for that
#>
#> test-match.R:14: warning: oe_match: sfc_POINT objects
#> st_crs<- : replacing crs does not reproject data; use st_transform for that
#>
#> test-match.R:18: warning: oe_match: sfc_POINT objects
#> st_crs<- : replacing crs does not reproject data; use st_transform for that
#>
#> test-match.R:18: warning: oe_match: sfc_POINT objects
#> st_crs<- : replacing crs does not reproject data; use st_transform for that
#>
#> test-match.R:23: warning: oe_match: sfc_POINT objects
#> st_crs<- : replacing crs does not reproject data; use st_transform for that
#>
#> test-match.R:24: warning: oe_match: sfc_POINT objects
#> st_crs<- : replacing crs does not reproject data; use st_transform for that
#>
#> test-match.R:38: warning: oe_match: sfc_POINT objects
#> st_crs<- : replacing crs does not reproject data; use st_transform for that
#>
#> test-match.R:38: warning: oe_match: sfc_POINT objects
#> st_centroid does not give correct centroids for longitude/latitude data
#>
#> test-match.R:43: warning: oe_match: sfc_POINT objects
#> st_crs<- : replacing crs does not reproject data; use st_transform for that
#>
#> test-match.R:43: warning: oe_match: sfc_POINT objects
#> st_centroid does not give correct centroids for longitude/latitude data
#>
#> test-match.R:43: warning: oe_match: sfc_POINT objects
#> st_crs<- : replacing crs does not reproject data; use st_transform for that
#>
#> test-match.R:43: warning: oe_match: sfc_POINT objects
#> st_centroid does not give correct centroids for longitude/latitude data
#>
#> test-match.R:50: warning: oe_match: numeric input
#> st_crs<- : replacing crs does not reproject data; use st_transform for that
#>
#> test-match.R:50: warning: oe_match: numeric input
#> st_crs<- : replacing crs does not reproject data; use st_transform for that
#>
#> test-match.R:55: warning: oe_match: different providers, match_by or max_string dist args
#> st_crs<- : replacing crs does not reproject data; use st_transform for that
#>
#> test-match.R:56: warning: oe_match: different providers, match_by or max_string dist args
#> st_crs<- : replacing crs does not reproject data; use st_transform for that
#>
#> test-match.R:56: warning: oe_match: different providers, match_by or max_string dist args
#> st_crs<- : replacing crs does not reproject data; use st_transform for that
#>
#> test-match.R:58: warning: oe_match: different providers, match_by or max_string dist args
#> st_crs<- : replacing crs does not reproject data; use st_transform for that
#>
#> test-match.R:59: warning: oe_match: different providers, match_by or max_string dist args
#> st_crs<- : replacing crs does not reproject data; use st_transform for that
#>
#> test-match.R:59: warning: oe_match: different providers, match_by or max_string dist args
#> st_crs<- : replacing crs does not reproject data; use st_transform for that
#>
#> test-match.R:60: warning: oe_match: different providers, match_by or max_string dist args
#> st_crs<- : replacing crs does not reproject data; use st_transform for that
#>
#> test-match.R:84: warning: oe_check_pattern: simplest examples work
#> st_crs<- : replacing crs does not reproject data; use st_transform for that
#>
#> test-match.R:85: warning: oe_check_pattern: simplest examples work
#> st_crs<- : replacing crs does not reproject data; use st_transform for that
#>
#> test-match.R:86: warning: oe_check_pattern: simplest examples work
#> st_crs<- : replacing crs does not reproject data; use st_transform for that
#> ────────────────────────────────────────────────────────────────────────────────────────────────────
#> ⠏ | 0 | providers✓ | 1 6 | providers
#> ────────────────────────────────────────────────────────────────────────────────────────────────────
#> test-providers.R:2: warning: oe_providers works correctly
#> st_crs<- : replacing crs does not reproject data; use st_transform for that
#>
#> test-providers.R:2: warning: oe_providers works correctly
#> st_crs<- : replacing crs does not reproject data; use st_transform for that
#>
#> test-providers.R:2: warning: oe_providers works correctly
#> st_crs<- : replacing crs does not reproject data; use st_transform for that
#>
#> test-providers.R:2: warning: oe_providers works correctly
#> st_crs<- : replacing crs does not reproject data; use st_transform for that
#>
#> test-providers.R:2: warning: oe_providers works correctly
#> st_crs<- : replacing crs does not reproject data; use st_transform for that
#>
#> test-providers.R:2: warning: oe_providers works correctly
#> st_crs<- : replacing crs does not reproject data; use st_transform for that
#> ────────────────────────────────────────────────────────────────────────────────────────────────────
#> ⠏ | 0 | read⠹ | 3 | read✓ | 3 1 | read [0.7 s]
#> ────────────────────────────────────────────────────────────────────────────────────────────────────
#> test-read.R:23: skip: or_read: simplest example with a URL works
#> Reason: empty test
#> ────────────────────────────────────────────────────────────────────────────────────────────────────
#> ⠏ | 0 | updateThe .gpkg files are going to be removed.
#> ⠹ | 1 2 | update✓ | 1 2 | update [0.3 s]
#> ────────────────────────────────────────────────────────────────────────────────────────────────────
#> test-update.R:4: warning: oe_update(): simplest example works
#> st_crs<- : replacing crs does not reproject data; use st_transform for that
#>
#> test-update.R:10: warning: oe_update(): simplest example works
#> st_crs<- : replacing crs does not reproject data; use st_transform for that
#> ────────────────────────────────────────────────────────────────────────────────────────────────────
#> ⠸ | 1 3 | update⠹ | 3 | vectortranslate✓ | 7 2 | vectortranslate [0.8 s]
#> ────────────────────────────────────────────────────────────────────────────────────────────────────
#> test-vectortranslate.R:42: warning: oe_get_keys: simplest examples work
#> st_crs<- : replacing crs does not reproject data; use st_transform for that
#>
#> test-vectortranslate.R:49: warning: oe_get_keys: returns error with wrong inputs
#> st_crs<- : replacing crs does not reproject data; use st_transform for that
#> ────────────────────────────────────────────────────────────────────────────────────────────────────
#>
#> ══ Results ═════════════════════════════════════════════════════════════════════════════════════════
#> Duration: 3.4 s
#>
#> OK: 48
#> Failed: 0
#> Warnings: 45
#> Skipped: 1
#>
#> Your tests are tremendous!
Created on 2020-09-30 by the reprex package (v0.3.0)
@Robinlovelace do you have any suggestion?
I guess it's a GDAL version issue. In terms of tackling the issue in the package, could we add a version check on package load along the following lines?
if(sf::sf_extSoftVersion()["GDAL"] < "3.0.0" ) warning("Not all features are supported, try updating your GDAL version. See https://github.com/r-spatial/sf#installing")
It seems to me that on Windows GDAL 3.0.4 is out in the binary install: https://github.com/rwinlib/gdal3
Regarding MacOS, which I guess @annakrystalli (apologies for the tag and thanks ; ) is running, I think updating GDAL as per instructions here have a chance of fixing it: https://github.com/r-spatial/sf#macos.
I noticed that some of our functions require recent versions of the 'OSGeo stack', including GEOS and PROJ, so it may also be an issue with some versions of those libraries. Good news: this issue will tend to fade as OSs update (it also affects some versions of Ubuntu I recall from previous CI issues that were fixed by adding the ubuntugis-unstable
repo on Ubuntu 16.04).
In terms of tackling the issue in the package, could we add a version check on package load along the following lines?
Ok, I agree, I will add a note to .onAttach()
(or .onLoad()
, I don't know). To be precise, the package still supports all the features even with "old" GDAL, the only problem is that when it runs the following code
https://github.com/ITSLeeds/osmextract/blob/b11f22ea795e45ed548cd15f450ba48f3ed003d6/R/providers.R#L37-L39
then it may return a warning message.
Regarding MacOS, which I guess @annakrystalli (apologies for the tag and thanks ; ) is running, I think updating GDAL as per instructions here have a chance of fixing it: https://github.com/r-spatial/sf#macos.
Ok thanks, I will try to write an answer after adding that note!
Hi! I will add here some notes related to the reviews of the package (see https://github.com/ropensci/software-review/issues/395#issuecomment-737551925). First of all, thank you very much @potterzot for your comments.
oe_match()
(and parent functions, like oe_get()
). I would use NULL
as default value and it would behave as follows: place
matches only one geographical area, then level
is ignored since there is no ambiguity regarding the chosen place. place
matches multiple areas, level
is NULL
(the default) and the function is run in an interactive session, then the function will print an interactive menu asking the user to select one of the intersecting areas. The interactive menu should present a short summary of all areas. place
matches multiple areas, level
is NULL
(the default) and the function is not run in an interactive session, then the function will return the geographical area with the highest level that is closest to the input place. place
matches multiple areas and level
is not NULL
, the function will return the geographical area with the chosen level that is closest to the input place. It will stop with an error, if no geographical area corresponds to the given level. In any case, I agree that the warning message could be made clearer, using the term smallest administrative unit
instead of level
.
Moreover, I agree that the warning messages could be improved since an important warning could be lost in the "wall of text". I would fix that following the ideas in https://github.com/ITSLeeds/osmextract/issues/133
Another comment from me: I think oe_download()
should also work if you give it a place name or lat/lon. Can open another issue but thought of it while reading this. Great to see this overview.
Hi @Robinlovelace, can you check the following response?
Dear potterzot and salvafern, thank you very much for your reviews. You provided friendly comments and constructive criticisms that, I'm sure, helped us improve our package defining new useful functions and arguments. In the following parts, I will provide several answers to all your comments.
Please add a CONTRIBUTING.md or a section to the README that details how to contribute as per the community guidelines. Contributing guidelines: not included in the README or in a separated CONTRIBUTING file. Maybe add a reference to the providers vignette in the README.
We added the following text at the end of the README: We very much look forward to comments, questions and contributions. If you have any doubt, or if you want to suggest a new approach or add a new OSM provider, feel free to create a new issue in the issue tracker or a new pull request. We always try to build the most intuitive user interface and write the most informative error messages, but if you think that something is not clear and could have been explained better, please let us know.
There are four warnings when I run devtools::test(), but looking at them I think that is expected behavior since the messages reflect tests of mispecified function arguments. Testing: 4 warnings (See my environment info below)
We decided to suppress some warning messages related to spatial operations in oe_match()
, while the remaining ones are expected since they are related to tests regarding the warning messages.
I ran into trouble when running oe_match. From the help message, it seems like when a location matches to multiple levels, I could select the level I want. oe_match doesn't take a level parameter, but it also doesn't issue a warning about a level parameter. I ended up thinking it was fine and then also trying to use level = 2 in oe_get, which did issue a warning that was couched in other output, but then returned an error that seemed when I read it unrelated to the inclusion of the level. I suggest either changing the wording to make it clear that you cannot select a level or adding a level parameter.
We recently added a new level
parameter to choose between multiple geographically nested areas. The default behaviour is the same as before, i.e. it selects the area associated with the highest level, which is linked with the smallest administrative unit:
yak = c(-120.51084, 46.60156)
osmextract::oe_match(yak)
#> The input place was matched with multiple geographical areas.
#> Selecting the smallest administrative unit. Check ?oe_match for more details.
#> The input place was matched with multiple zones at the same level. Check ?oe_match for more details.
#> Selecting the area whose centroid is closest to the input place.
#> although coordinates are longitude/latitude, st_nearest_points assumes that they are planar
#> $url
#> [1] "https://download.geofabrik.de/north-america/us/washington-latest.osm.pbf"
#>
#> $file_size
#> [1] 180538717
Created on 2021-01-18 by the reprex package (v0.3.0)
The level
argument can be used to choose a bigger administrative area:
yak = c(-120.51084, 46.60156)
osmextract::oe_match(yak, level = 1)
#> The input place was matched with multiple geographical areas.
#> Selecting the desired level.
#> $url
#> [1] "https://download.geofabrik.de/north-america-latest.osm.pbf"
#>
#> $file_size
#> [1] 10776557870
Created on 2021-01-18 by the reprex package (v0.3.0)
It returns an error message if there is no area with the chosen level.
yak = c(-120.51084, 46.60156)
osmextract::oe_match(yak, level = 3)
#> The input place was matched with multiple geographical areas.
#> Selecting the desired level.
#> Error in oe_match.sfc(place, provider = provider, quiet = quiet, ...): The input place does not intersect any area at the chosen level.
Created on 2021-01-18 by the reprex package (v0.3.0)
On a related issue, it's hard to know what "highest level" in the message means without reading. It might be clearer to say something like "smallest administrative unit" or "smallest geography", I'm not sure what the best term would be.
The oe_get function provides a warning message along those lines, but it gets lost in the rest of the output and ultimately throws an error that is hard to relate back to the inclusion of the level argument:
We decided to suppress some warnings and simplify some messages. For example, the current output could be something like:
#> The input place was matched with multiple geographical areas.
#> Selecting the smallest administrative unit. Check ?oe_match for more details.
#> The input place was matched with multiple zones at the same level. Check ?oe_match for more details.
#> Selecting the area whose centroid is closest to the input place.
Do you think it's better now? We did not include many details in the message but just added a reference to the help pages.
On my first look at the README I thought that oe_get() could retrieve geometries from anywhere, and got a bit disappointed when I saw it was not the case.
We changed the behaviour of oe_match()
(and, as a consequence, oe_get()
). Now, if the approximate string distance between the closest string match and the input place
is greater than max_string_dist
, then oe_match()
will also check the other supported providers. For example:
osmextract::oe_match("Madrid")
#> No exact match found for place = Madrid and provider = geofabrik. Best match is Mali.
#> Checking the other providers.
#> An exact string match was found using provider = bbbike.
#> $url
#> [1] "https://download.bbbike.org/osm/bbbike/Madrid/Madrid.osm.pbf"
#>
#> $file_size
#> [1] 33969933
Created on 2021-01-18 by the reprex package (v0.3.0)
We added a link between osmextract and Nominatim provider, which implies that oe_get()
could retrieve geometries from anywhere. If the input place
is not perfectly matched with any of the supported providers and match_by
argument is equal to "name"
, then oe_match()
will use the Nominatim API to geolocate the input place
and perform a spatial matching operation:
osmextract::oe_match("Ghent")
#> Warning: The input place was matched with multiple geographical zones: Ghana -
#> Kent. Selecting the first match.
#> No exact match found for place = Ghent and provider = geofabrik. Best match is Ghana.
#> Checking the other providers.
#> No exact match found in any OSM provider data. Searching for the location online.
#> The input place was matched with multiple geographical areas.
#> Selecting the smallest administrative unit. Check ?oe_match for more details.
#> $url
#> [1] "https://download.geofabrik.de/europe/belgium-latest.osm.pbf"
#>
#> $file_size
#> [1] 423345383
Created on 2021-01-18 by the reprex package (v0.3.0)
Code style: I ran styler:::style_active_pkg() and it changed 23 out of 25 files. I didn't compare the changes but maybe you want to have a look.
I think that this is simply caused by the fact that we use the =
operator for assignment instead of <-
.
Citation: You could add your preferred citation (See https://devguide.ropensci.org/building.html#readme)
We are working on a paper and we will add citation information as soon as possible.
Continuous Integration: Apart of the code coverage, are you planning to add a CI system?
We use Github Actions: https://github.com/ITSLeeds/osmextract/actions. Here you can browse the corresponding .yaml
files.
We added the following text at the end of the README: We very much look forward to comments, questions and contributions. If you have any doubt, or if you want to suggest a new approach or add a new OSM provider, feel free to create a new issue in the issue tracker or a new pull request. We always try to build the most intuitive user interface and write the most informative error messages, but if you think that something is not clear and could have been explained better, please let us know.
I suggest adding a link to the issue tracker.
We added a link between osmextract and Nominatim provider, which implies that
oe_get()
could retrieve geometries from anywhere. If the inputplace
is not perfectly matched with any of the supported providers andmatch_by
argument is equal to"name"
, thenoe_match()
will use the Nominatim API to geolocate the inputplace
and perform a spatial matching operation:
I suggest a slight reword:
We added a simple interface to Nominatim that enables `oe_get()` to 'geolocated' text strings that cannot be found in the providers. If the input `place` is not perfectly matched with any of the supported providers and `match_by` argument is equal to `"name"`, then `oe_match()` will now, after https://github.com/ITSLeeds/osmextract/pull/155, use the Nominatim API to geolocate the input `place` and perform a spatial matching operation:
Pending those 2 minor suggested changes this is ready to ship :ship: good work :+1:
😄 🚀 🎉
As advocated by @agila5 here https://github.com/ITSLeeds/geofabrik/issues/32#issuecomment-577038388