Closed erzk closed 6 years ago
Thanks for this submission to the geospatial area. While I seek reviewers, please look over some checks from goodpractice
below. Not all will be relevant (e.g. test coverage is already quite high) but there are other issues worth fixing before the reviewers begin.
Reviewers: @Robinlovelace, @cvitolo Due date: February 6, 2018
── GP PostcodesioR ───────────────────────────────────────────────
It is good practice to
✖ write unit tests for all functions, and all package code
in general. 97% of code lines are covered by test cases.
R/outcode_reverse_geocoding.R:27:NA
R/place_lookup.R:18:NA
R/place_query.R:20:NA
✖ not use "Depends" in DESCRIPTION, as it can cause name
clashes, and poor interaction with other packages. Use "Imports" instead.
✖ use '<-' for assignment instead of '='. '<-' is the
standard, and R users and developers are used it and it is easierto read your code for them if you use '<-'.
R/postcode_lookup.R:26:41
tests/testthat/test-place_query.R:6:17
tests/testthat/test-postcode_autocomplete.R:7:17
tests/testthat/test-postcode_query.R:8:17
✖ avoid long code lines, it is bad for readability. Also,
many people prefer editor windows that are about 80 characters wide. Try make your lines shorter than 80 characters
R/outcode_reverse_geocoding.R:22:1
R/place_query.R:25:1
R/postcode_autocomplete.R:25:1
R/postcode_query.R:25:1
R/reverse_geocoding.R:22:1
... and 7 more lines
✖ avoid sapply(), it is not type safe. It might return a
vector, or a list, depending on the input data. Consider using vapply() instead.
R/postcode_lookup.R:26:13
R/random_place.R:19:21
✖ not import packages as a whole, as this can cause name
clashes between the imported packages. Instead, import only the specific functions you need.
✖ fix this R CMD check WARNING: LaTeX errors when creating
PDF version. This typically indicates Rd problems.
✖ fix this R CMD check ERROR: Re-running with no
redirection of stdout/stderr. Hmm ... looks like a package You may want to clean up by 'rm -rf /tmp/Rtmpbpsw7A/Rd2pdf59e56e855476'
───────────────────────────────────────────────
@Robinlovelace is reviewer one. I've given you extra time since you're currently busy. Please let me know if you are able to get a review in by Feb 6th. 🙏
@cvitolo is reviewer 2. Thanks Claudia!
Sure will give it a go.
@karthik Here is my review.
@erzk Nice package and works like a charm, well done! Below are few comments, the vast majority are related to the documentation, only few to functionalities. I hope my comments are self-explanatory but if you need further explaination please do not hesitate to ask :)
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).Estimated hours spent reviewing: 4
It is good practice to output a table in which column names are unique. If there are duplicates, these columns cannot be easily subsetted by name. A table with duplicated column names is obtained when running the following command (taken from vignette):
lookup_result <- postcode_lookup("EC1Y8LX")
In lookup_result
, columns admin_district
, admin_county
, admin_ward
, parish
, parliamentary_constituency
, ccg
are duplicated. It is not simple to spot those duplicated labels, however, one way to go about this is to convert the data.frame to a tibble:
library(tibble)
lookup_result_tbl <- as.tibble(lookup_result)
This way, if you have duplicated column names, you get a self-explanatory error. In this case it returns:
Error: Columns admin_district
, admin_county
, admin_ward
, parish
, parliamentary_constituency
, ccg
, nuts
must have unique names.
I would suggest to make all the column names unique.
All the exported functions are documented but it would be extremely useful to have more info on the returned values.
Not sure why bulk_postcode_lookup
only works with lists with 2 or more elements. Why is that? This limitation should be documented in the help page or, alternatively, you could merge postcode_lookup
and bulk_postcode_lookup
in one unique function (this would me my preferred option). The function could check whether the input is a list or a string and then apply bulk_postcode_lookup
or postcode_lookup
, respectively.
The help page of bulk_reverse_geocoding needs updating:
geolocations
argument.geolocations
argument is a list. You probably meant the API method not this package? The documentation of nearest_postcode
should clarify how the nearest postcodes are selected. Is it sorting them based on the distances between postcode's centres?
Same question applies to nearest_outcode
and nearest_outcode_lonlat
.
There is a typo in the help page: 'an list' should be corrected to 'a list'
These functions are not mentioned in the vignette:
postcode_autocomplete
could be mentioned as it is a very useful.What is the difference between postcode_query
and postcode_lookup
?
The documentation should mention what data are retrieved in both cases.
Description section needs updating, as it seems to describe one of the nearest_*
functions.
It should also explain what it is meant by 'terminated'.
Many functions have a query limit of 100, can this lead to misleading results? Please expand and document this limitation in the appropriate help pages.
This is just a general question/curiosity: the Postcodesio API has an MIT license, have you thought of applying the same license rather than a GPL3?
When I run devtools::check() on the package I get 1 note:
Status: 1 NOTE checking top-level files ... NOTE Non-standard file/directory found at top level: ‘README.Rmd’
This note can be removed by adding README.Rmd to your .Rbuildignore file.
I noticed you added httr as a dependency. Please consider using the Import field and import only the functions you need, specifying them in the NAMESPACE file, see example below:
importFrom(httr, GET)
When I run lintr checks, it shows that the following lines are longer than 80 characters:
Please consider to split each of the above lines into two.
When I run lintr checks, it shows that you use sapply(). If you can, please consider using vapply() instead, as it returns a consistent data type.
The vignette runs fast and smoothly, well done!
The content of the vignette is pretty much the same as the README. I would suggest not to duplicate the content because it can be difficult to keep them synched.
If you want to keep a vignette, in my opinion, this should be a bit more descriptive than a README file (they often resemble the structure of journal papers). You could maybe expand the introduction to include information about postcodeio (open source, based exclusively on open data, etc.), the target audience of your package and its potential applications. In my opinion, your package has great potential not only as a stand-alone tool, but also as a back-end tool for shiny applications to map and filter geospatial information. It could also be a useful dependency for other packages, to filter data based on postcodes rather than counties/local authorities/coordinates (I have a couple of them, rnrfa and rdefra, that could make good use of PostcodeioR). Of course, it would be great to also have a more detailed description of each function (in which case you would use it, how the result compares to what you get from another function and so forth).
All the exported functions are documented but it would be extremely useful to have:
@cvitolo Thanks for the comments. Very useful. I'll address the issues that you mentioned ASAP.
@Robinlovelace Just wanted to give you a quick ping and see if you've had time to complete your review. Please let us know if you need more time.
@erzk Quick note: I’ll be away from the internet for about a week and will be back online around 02/21. So I will be back to editorial activities shortly.
No but will get on it.
The package includes all the following forms of documentation:
if (!require("devtools")) install.packages("devtools")
## Loading required package: devtools
devtools::install_github("erzk/PostcodesioR")
## Skipping install of 'PostcodesioR' from a github remote, the SHA1 (abc9fee6) has not changed since last install.
## Use `force = TRUE` to force installation
library(PostcodesioR)
## Loading required package: httr
works fine but the style is non-standard (expect a PR incoming)
Introduction.Rmd
vignette seems to duplicate README.Rmd
. I
suggest reducing this duplication to a minimum and making the README
shorter.URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).r1. UK postcodes are a widely used form of geo-referenced and are seldom easy to use, so the idea behind this package is welcome. I think it the approach of wrapping an online service could be useful for small, intermittent use cases. However for larger datasets, a package that does not rely on internet access would be more useful. I think this would be possible, based on this statment in the README:
postcodes.io provides full list of geolocation data that can be used locally without limitations.
r2. Would creating a package that did this be an option?
r3. Regarding style, the package does not adhere to Ropensci's style
guide in a number of ways. I suggest renaming the package postcodesio
r4. Lack of support for common spatial classes for input and output. Such as simple feature points.
r5. The output of calls to postcodes.io seems to be overly verbose. I am confident a data frame output type is more appropriate in this case, particularly as the result for a single lookup is a data frame:
lookup_result <- postcode_lookup("EC1Y8LX")
class(lookup_result)
## [1] "data.frame"
pc_list <- list(postcodes = c("PR3 0SG", "M45 6GN", "EX165BL"))
bulk_lookup_result <- bulk_postcode_lookup(pc_list)
class(bulk_lookup_result)
## [1] "list"
# str(bulk_lookup_result[1]) # not shown: overly verbose!
r6. At times the suggested input formats are not user friendly. I do not think this is a realistic user entry, for example:
# create a JSON object with the coordinates
geolocations_list <- structure(
list(
geolocations = structure(
list(
longitude = c(-3.15807731271522, -1.12935802905177),
latitude = c(51.4799900627036, 50.7186356978817),
limit = c(NA, 100L),
radius = c(NA, 500L)),
.Names = c("longitude", "latitude", "limit", "radius"),
class = "data.frame",
row.names = 1:2)),
.Names = "geolocations")
bulk_rev_geo <- bulk_reverse_geocoding(geolocations_list)
It would be more useful if the user were given an easy function to
create such inputs into bulk_reverse_geocoding
r7. httr would be better as an Imports than as a Depends (I'm finally transitioning away from Depending on sp for my stplanr package)
Overall great package, thanks for the opportunity to review it and I hope this feedback is useful!
@Robinlovelace thanks for the review. I was away for a while so I couldn't work on the package but now I'll get back to it.
@cvitolo I made changes as you recommended. I answered your comments in more detail:
postcode_lookup
returns a data frame and postcode_query
a list. I don't think there was any particular reason for that. I will check with the API authors. postcode_query
call allows limiting the results. The rest seems the same.goodpractice
package which suggested these changes but using vapply()
breaks my code and would need extra steps to reshape the output of sapply()
postcode_lookup()
). Returned values are described in the original documentation. I refer users to it. The description itself is quite long and I don't think it needs to be repeated. Also, see point 2.Thanks again for reviewing the code. I still need to address Robin's comments.
@Robinlovelace Thanks for the comments, they were very helpful.
Statement of need was added. README was shortened and examples moved to a vignette. Installation: what's non-standard about the style? what would you suggest?
r1. I agree. r2. The next step will be to create a package (or new functions in this package) that can achieve the same thing locally. Any suggestions would be very helpful. r3. postcodesio would not be search engine friendly. postcodesior would be better, without the capital cases I used but that would mean renaming not only the package but also some blog posts or links. If it's not a major issue, I'd prefer to stick to the old name. I will use the lowercase for future packages. r4. Can you elaborate? I'm using the basic cases that are allowed/returned by the API. r5. I agree. Data frames would be easier to use. The problem here was that in some cases the lists that are returned had unequal number of elements thus turning them into a data frame was tricky. Will try to find an example. It was more tricky than expected. PR would be welcome if you have a solution. r6. I clarified this example and showed how to turned a data frame into a format required by the function. r7. fixed. @cvitolo suggested the same thing. I've tried it before but I must have done something wrong.
@erzk thanks for addressing all my comments. Great job!
Please find below few more suggestions.
Maybe your data.frame needs list-columns?
I know it's a tedious task but a full description of the expected output is needed. If you have multiple functions (e.g. A, B, C) generating the same table, you could add the full description to function A. Function B and C can simply contain a link to the description in function A. Does it make sense?
Alternatively you can add links to the original API documentation. But please try to add them systematically to every help page.
OK, it makes sense to keep the R functions as similar to the function calls as possible. Do you mind to add a note on this to the package help pages (postcode_lookup and bulk_postcoe_lookup)?
One more thing, at the moment if your postcode
input argument for postcode_lookup() is a vector of two postcodes, the function fails:
> postcode_lookup(c("EC1Y8LX", "RG13DQ"))
Error: length(url) == 1 is not TRUE
It would be great to make this error more meaningful. Maybe you could add something like this at the beginnig of postcode_lookup():
if (length(nchar(postcode)) > 1){
stop("This function accepts only one postcode, for multiple requests please use bulk_postcode_lookup().")
}
This way, when you run the function with the wrong input, the user gets a clear suggestion to fix the problem:
> postcode_lookup(c("EC1Y8LX", "RG13DQ"))
Error in postcode_lookup(c("EC1Y8LX", "RG13DQ")) :
This function accepts one postcode, for multiple postcodes please use bulk_postcode_lookup().
OK, let me know when you fix the JSON vs list bit.
Perfect, let me know when the info is added to the documentation.
There is still an inconsistency to fix in the documentation: the Value section says the function returns a list, while in reality it returns a data.frame (as also stated in the Description section).
OK
I would suggest to add a comment on this in the relevant help pages.
OK
Is there a way to change the default API limit of 100 calls? If so, is there a way to change that default parameter in your package as well?
OK
No problem :)
In my opinion, it is better to 'consciously' import the functions you need because that eliminates the problem of masking functions by mistake.
OK
OK
OK
That's fine. It would be very convenient to have links to the original documentation in the help pages as well.
Comments post-review:
r1. I agree. + r2. The next step will be to create a package (or new functions in this package) that can achieve the same thing locally. Any suggestions would be very helpful.
No suggestions at present, other than to download all the boundaries used by the API and use a spatial package such as sf to extract the boundaries for any point. This could link to my prototype https://github.com/Robinlovelace/ukboundaries/ package which I hope (with some encouragement from @karthik and perhaps support from the ONS) to submit to ROpenSci at some point.
r3. postcodesio would not be search engine friendly. postcodesior would be better, without the capital cases I used but that would mean renaming not only the package but also some blog posts or links. If it's not a major issue, I'd prefer to stick to the old name. I will use the lowercase for future packages.
It's not a major issue.
r4. Can you elaborate? I'm using the basic cases that are allowed/returned by the API.
I think this is a major issue. R has some add-on packages that make it intuitive to process, analyse and model spatial data. Premier among these for vector data is sf. If it could take sf
objects in and perhaps return sf
objects out that would be a great benefit. I suggest taking a look at https://github.com/ropensci/osmdata (and my own stplanr package). These allow inputs and outputs in various spatial formats which is very useful if you work with spatial data which presumably users of PostcodesioR will.
r5. I agree. Data frames would be easier to use. The problem here was that in some cases the lists that are returned had unequal number of elements thus turning them into a data frame was tricky. Will try to find an example. It was more tricky than expected. PR would be welcome if you have a solution.
Like @cvitolo I think list columns or empty columns could save you here. Unlikely to find time to put in a PR to do this with the priority being to complete the https://github.com/Robinlovelace/geocompr book at present.
r6. I clarified this example and showed how to turned a data frame into a format required by the function.
Great work.
r7. fixed. @cvitolo suggested the same thing. I've tried it before but I must have done something wrong.
Awesome.
@cvitolo
postcode_lookup()
, place_lookup()
, and terminated_postcode()
. They return three different sets of values as described in the API [documentation](#' @seealso \code{\link{place_lookup}} for documentation.). The remaining functions have references to one of these three relevant functions.@param limit
was added to specify the limit.@importFrom httr GET
was added. Thanks for the revisions @erzk and for the detailed and thoughtful reviews @cvitolo and @Robinlovelace Can you two look over the latest revisions and sign off if you are satisfied? 🙏
I'm satisfied with the submission and the updates, great to be part of the peer review process. If I think of any further issues would they be best as issues in the issue tracker or as comments here? (Guess: it depends on whether they are directly related to the review process.) Fantastic work @erzk!
If I think of any further issues would they be best as issues in the issue tracker or as comments here?
It would be best on the issue tracker of the repo itself (the location will change once accepted, but GitHub will redirect appropriately).
I'm satisfied with the changes, well done @erzk!
On Mon, 16 Apr 2018, 22:27 Karthik Ram, notifications@github.com wrote:
If I think of any further issues would they be best as issues in the issue tracker or as comments here?
It would be best on the issue tracker of the repo itself (the location will change once accepted, but GitHub will redirect appropriately).
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/onboarding/issues/176#issuecomment-381755550, or mute the thread https://github.com/notifications/unsubscribe-auth/AEhdr12paEOboRb3a4ZKcVa4wF_O2V5qks5tpQzMgaJpZM4RRE9J .
Congrats @erzk! your package has been approved! 🎉 Thank you for submitting and @cvitolo and @Robinlovelace for thorough and timely reviews. To-dos:
[x] Transfer the repo to the rOpenSci organization under "Settings" in your repo. I have invited you to a team that should allow you to do so. You'll be made admin once you do.
[x] Add the rOpenSci footer to the bottom of your README
[![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)
Welcome aboard! We'd also love a blog post about your package, either a short-form intro to it (https://ropensci.org/tech-notes/) or long-form post with more narrative about its development. ((https://ropensci.org/blog/). If you are, @stefaniebutland will be in touch about content and timing.
Final suggestion. If you found feedback from the reviewers valuable, consider adding them as reviewers in the description of your package (this is entirely up to you and not a requirement). If you decide to, you'll need:
@karthik All done. @stefaniebutland I'm happy to write a blog post about the package. @cvitolo @Robinlovelace Thanks for reviewing the code. I added you as reviewers in the DESCRIPTION file.
Thanks @erzk! That was fast! I'll close this issue since the review is complete but you will hear further from Stefanie on the possibility of writing a blog post.
@erzk Great to hear you're interested in contributing a post about PostcodesioR
. You can read example narrative posts about onboarded packages here: https://ropensci.org/tags/onboarding/ and technotes here: https://ropensci.org/technotes/. Technotes can be published at any time, while blog posts are scheduled in advance.
Here are some technical and editorial guidelines: https://github.com/ropensci/roweb2#contributing-a-blog-post.
We ask that you submit a draft via pull request at least one week before the planned publication date. At this point there are several blog posts in the pipeline, so perhaps best for you to suggest a deadline by which you would like to submit a draft.
Summary
It is an API wrapper for http://postcodes.io/ which is a free geolocation service for the UK data.
https://github.com/erzk/PostcodesioR
Geospatial data, because the package deals with UK geospatial data.
Whoever needs information about UK postcodes, outcodes, or places. This package will be particularly useful for social scientists.
Not that I'm aware of. This package return a rich geospatial data related to postcodes (but not only).
NA
Requirements
Confirm each of the following by checking the box. This package:
Publication options
paper.md
matching JOSS's requirements with a high-level description in the package root or ininst/
.Detail
[x] Does
R CMD check
(ordevtools::check()
) succeed? Paste and describe any errors or warnings:[x] Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:
If this is a resubmission following rejection, please explain the change in circumstances:
If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names: