ropensci / software-review

rOpenSci Software Peer Review.
291 stars 104 forks source link

PostcodesioR submission #176

Closed erzk closed 6 years ago

erzk commented 6 years ago

Summary

It is an API wrapper for http://postcodes.io/ which is a free geolocation service for the UK data.

Package: PostcodesioR
Type: Package
Title: API wrapper around postcodes.io (free UK postcode lookup and geocoder)
Version: 0.1.1
Authors@R: person("Eryk", "Walczak", email = "eryk.j.walczak@gmail.com", role = c("aut", "cre"))
Description: Free UK geocoding using data from Office for National Statistics.
    It is using several functions to get information about post codes, outward
    codes, reverse geocoding, nearest post codes/outward codes, validation, or
    randomly generate a post code.
License: GPL-3 + file LICENSE
URL: https://github.com/erzk/PostcodesioR/
LazyData: TRUE
Depends:
    R (>= 3.1),
    httr
RoxygenNote: 6.0.1
Suggests: knitr,
    rmarkdown,
    testthat,
    covr
VignetteBuilder: knitr
BugReports: https://github.com/erzk/PostcodesioR/issues

https://github.com/erzk/PostcodesioR

Geospatial data, because the package deals with UK geospatial data.

Whoever needs information about UK postcodes, outcodes, or places. This package will be particularly useful for social scientists.

Not that I'm aware of. This package return a rich geospatial data related to postcodes (but not only).

NA

Requirements

Confirm each of the following by checking the box. This package:

Publication options

Detail

karthik commented 6 years ago

Editor checks:


Editor comments

Thanks for this submission to the geospatial area. While I seek reviewers, please look over some checks from goodpractice below. Not all will be relevant (e.g. test coverage is already quite high) but there are other issues worth fixing before the reviewers begin.


Reviewers: @Robinlovelace, @cvitolo Due date: February 6, 2018

── GP PostcodesioR ───────────────────────────────────────────────
It is good practice to

  ✖ write unit tests for all functions, and all package code
    in general. 97% of code lines are covered by test cases.

    R/outcode_reverse_geocoding.R:27:NA
    R/place_lookup.R:18:NA
    R/place_query.R:20:NA

  ✖ not use "Depends" in DESCRIPTION, as it can cause name
    clashes, and poor interaction with other packages. Use "Imports" instead.
  ✖ use '<-' for assignment instead of '='. '<-' is the
    standard, and R users and developers are used it and it is easierto read your code for them if you use '<-'.

    R/postcode_lookup.R:26:41
    tests/testthat/test-place_query.R:6:17
    tests/testthat/test-postcode_autocomplete.R:7:17
    tests/testthat/test-postcode_query.R:8:17

  ✖ avoid long code lines, it is bad for readability. Also,
    many people prefer editor windows that are about 80 characters wide. Try make your lines shorter than 80 characters

    R/outcode_reverse_geocoding.R:22:1
    R/place_query.R:25:1
    R/postcode_autocomplete.R:25:1
    R/postcode_query.R:25:1
    R/reverse_geocoding.R:22:1
    ... and 7 more lines

  ✖ avoid sapply(), it is not type safe. It might return a
    vector, or a list, depending on the input data. Consider using vapply() instead.

    R/postcode_lookup.R:26:13
    R/random_place.R:19:21

  ✖ not import packages as a whole, as this can cause name
    clashes between the imported packages. Instead, import only the specific functions you need.
  ✖ fix this R CMD check WARNING: LaTeX errors when creating
    PDF version. This typically indicates Rd problems.
  ✖ fix this R CMD check ERROR: Re-running with no
    redirection of stdout/stderr. Hmm ... looks like a package You may want to clean up by 'rm -rf /tmp/Rtmpbpsw7A/Rd2pdf59e56e855476'
───────────────────────────────────────────────
erzk commented 6 years ago
karthik commented 6 years ago

@Robinlovelace is reviewer one. I've given you extra time since you're currently busy. Please let me know if you are able to get a review in by Feb 6th. 🙏

karthik commented 6 years ago

@cvitolo is reviewer 2. Thanks Claudia!

Robinlovelace commented 6 years ago

Sure will give it a go.

cvitolo commented 6 years ago

Package Review

@karthik Here is my review.

@erzk Nice package and works like a charm, well done! Below are few comments, the vast majority are related to the documentation, only few to functionalities. I hope my comments are self-explanatory but if you need further explaination please do not hesitate to ask :)

Documentation

The package includes all the following forms of documentation:

Functionality

Final approval (post-review)

Estimated hours spent reviewing: 4


Review Comments

1. Duplicated column names

It is good practice to output a table in which column names are unique. If there are duplicates, these columns cannot be easily subsetted by name. A table with duplicated column names is obtained when running the following command (taken from vignette):

lookup_result <- postcode_lookup("EC1Y8LX")

In lookup_result, columns admin_district, admin_county, admin_ward, parish, parliamentary_constituency, ccg are duplicated. It is not simple to spot those duplicated labels, however, one way to go about this is to convert the data.frame to a tibble:

library(tibble)
lookup_result_tbl <- as.tibble(lookup_result)

This way, if you have duplicated column names, you get a self-explanatory error. In this case it returns:

Error: Columns admin_district, admin_county, admin_ward, parish, parliamentary_constituency, ccg, nuts must have unique names.

I would suggest to make all the column names unique.

2. Function Documentation:

All the exported functions are documented but it would be extremely useful to have more info on the returned values.

3. bulk_postcode_lookup only works with lists with 2 or more elements

Not sure why bulk_postcode_lookup only works with lists with 2 or more elements. Why is that? This limitation should be documented in the help page or, alternatively, you could merge postcode_lookup and bulk_postcode_lookup in one unique function (this would me my preferred option). The function could check whether the input is a list or a string and then apply bulk_postcode_lookup or postcode_lookup, respectively.

4. The documentation of bulk_reverse_geocoding needs updating

The help page of bulk_reverse_geocoding needs updating:

*5. Logic of nearest_ functions**

The documentation of nearest_postcode should clarify how the nearest postcodes are selected. Is it sorting them based on the distances between postcode's centres? Same question applies to nearest_outcode and nearest_outcode_lonlat.

6. Function postcode_autocomplete

There is a typo in the help page: 'an list' should be corrected to 'a list'

7. Functions not mentioned in vignette

These functions are not mentioned in the vignette:

8. postcode_query

What is the difference between postcode_query and postcode_lookup? The documentation should mention what data are retrieved in both cases.

9. terminated_postcode

Description section needs updating, as it seems to describe one of the nearest_* functions.

It should also explain what it is meant by 'terminated'.

10. Query limitation

Many functions have a query limit of 100, can this lead to misleading results? Please expand and document this limitation in the appropriate help pages.

11. LICENSE

This is just a general question/curiosity: the Postcodesio API has an MIT license, have you thought of applying the same license rather than a GPL3?

12. Note in check

When I run devtools::check() on the package I get 1 note:

Status: 1 NOTE checking top-level files ... NOTE Non-standard file/directory found at top level: ‘README.Rmd’

This note can be removed by adding README.Rmd to your .Rbuildignore file.

13. DESCRIPTION file

I noticed you added httr as a dependency. Please consider using the Import field and import only the functions you need, specifying them in the NAMESPACE file, see example below:

importFrom(httr, GET)

14. Lines longer than 80 characters

When I run lintr checks, it shows that the following lines are longer than 80 characters:

Please consider to split each of the above lines into two.

15. Avoid sapply()

When I run lintr checks, it shows that you use sapply(). If you can, please consider using vapply() instead, as it returns a consistent data type.

16. Vignette

The vignette runs fast and smoothly, well done!

The content of the vignette is pretty much the same as the README. I would suggest not to duplicate the content because it can be difficult to keep them synched.

If you want to keep a vignette, in my opinion, this should be a bit more descriptive than a README file (they often resemble the structure of journal papers). You could maybe expand the introduction to include information about postcodeio (open source, based exclusively on open data, etc.), the target audience of your package and its potential applications. In my opinion, your package has great potential not only as a stand-alone tool, but also as a back-end tool for shiny applications to map and filter geospatial information. It could also be a useful dependency for other packages, to filter data based on postcodes rather than counties/local authorities/coordinates (I have a couple of them, rnrfa and rdefra, that could make good use of PostcodeioR). Of course, it would be great to also have a more detailed description of each function (in which case you would use it, how the result compares to what you get from another function and so forth).

17. Function documentation

All the exported functions are documented but it would be extremely useful to have:

erzk commented 6 years ago

@cvitolo Thanks for the comments. Very useful. I'll address the issues that you mentioned ASAP.

karthik commented 6 years ago

@Robinlovelace Just wanted to give you a quick ping and see if you've had time to complete your review. Please let us know if you need more time.

@erzk Quick note: I’ll be away from the internet for about a week and will be back online around 02/21. So I will be back to editorial activities shortly.

Robinlovelace commented 6 years ago

No but will get on it.

Robinlovelace commented 6 years ago

Package Review

Documentation

The package includes all the following forms of documentation:

if (!require("devtools")) install.packages("devtools")

## Loading required package: devtools

devtools::install_github("erzk/PostcodesioR")

## Skipping install of 'PostcodesioR' from a github remote, the SHA1 (abc9fee6) has not changed since last install.
##   Use `force = TRUE` to force installation

library(PostcodesioR)

## Loading required package: httr

works fine but the style is non-standard (expect a PR incoming)

r1. UK postcodes are a widely used form of geo-referenced and are seldom easy to use, so the idea behind this package is welcome. I think it the approach of wrapping an online service could be useful for small, intermittent use cases. However for larger datasets, a package that does not rely on internet access would be more useful. I think this would be possible, based on this statment in the README:

postcodes.io provides full list of geolocation data that can be used locally without limitations.

r2. Would creating a package that did this be an option?

r3. Regarding style, the package does not adhere to Ropensci's style guide in a number of ways. I suggest renaming the package postcodesio

r4. Lack of support for common spatial classes for input and output. Such as simple feature points.

r5. The output of calls to postcodes.io seems to be overly verbose. I am confident a data frame output type is more appropriate in this case, particularly as the result for a single lookup is a data frame:

lookup_result <- postcode_lookup("EC1Y8LX")
class(lookup_result)

## [1] "data.frame"

pc_list <- list(postcodes = c("PR3 0SG", "M45 6GN", "EX165BL"))
bulk_lookup_result <- bulk_postcode_lookup(pc_list)

class(bulk_lookup_result)

## [1] "list"

# str(bulk_lookup_result[1]) # not shown: overly verbose!

r6. At times the suggested input formats are not user friendly. I do not think this is a realistic user entry, for example:

# create a JSON object with the coordinates
geolocations_list <- structure(
 list(
 geolocations = structure(
 list(
 longitude = c(-3.15807731271522, -1.12935802905177),
 latitude = c(51.4799900627036, 50.7186356978817),
 limit = c(NA, 100L),
 radius = c(NA, 500L)),
 .Names = c("longitude", "latitude", "limit", "radius"),
 class = "data.frame",
 row.names = 1:2)),
 .Names = "geolocations")

bulk_rev_geo <- bulk_reverse_geocoding(geolocations_list)

It would be more useful if the user were given an easy function to create such inputs into bulk_reverse_geocoding

r7. httr would be better as an Imports than as a Depends (I'm finally transitioning away from Depending on sp for my stplanr package)

Overall great package, thanks for the opportunity to review it and I hope this feedback is useful!

erzk commented 6 years ago

@Robinlovelace thanks for the review. I was away for a while so I couldn't work on the package but now I'll get back to it.

erzk commented 6 years ago

@cvitolo I made changes as you recommended. I answered your comments in more detail:

  1. I'm not sure how to address this point. The problem here is that in some cases the API returns a structure that can't be turned into a data frame. I need to work on this a bit longer. TODO: df vs. list
  2. Additional information about returned data types was added to README. I think it might be too long to describe each data point in each function as they are repeating. Do you have other suggestion?
  3. The functions are split into two different calls by postcodes.io. I tried to keep the function as similar to the function calls as possible. The API specifies that bulk works with two or more postcodes. I added this info to documentation.
  4. Title fixed. TODO: JSON vs. list
  5. This info is missing from the API documentation. I will contact the authors for more detail.
  6. Fixed. Well spotted.
  7. Fixed
  8. Both function seem to be the same. postcode_lookup returns a data frame and postcode_query a list. I don't think there was any particular reason for that. I will check with the API authors. postcode_query call allows limiting the results. The rest seems the same.
  9. Fixed. Documentation is up to date now.
  10. Limits of 100 are the default in the API calls so I kept it the same in the package.
  11. GPL3 keeps the modifications free (and open source) so I prefer it to the MIT license.
  12. Fixed. Thanks for the tip. I had the same problem in my other packages.
  13. I haven't changed that as I rely strongly on httr. A helper function in zzz.R is calling three functions from httr and there are additional ones in functions. That's my only dependency so I'm loading it as is. What would be a benefit of importing each function from httr separately?
  14. Fixed
  15. I used goodpractice package which suggested these changes but using vapply() breaks my code and would need extra steps to reshape the output of sapply()
  16. The README files has new information about use cases and explains where the package can be useful. README was trimmed down because @Robinlovelace also suggested it's too long. The examples (except one) were moved to a vignette.
  17. Comments added to function documentation (e.g. postcode_lookup()). Returned values are described in the original documentation. I refer users to it. The description itself is quite long and I don't think it needs to be repeated. Also, see point 2.

Thanks again for reviewing the code. I still need to address Robin's comments.

erzk commented 6 years ago

@Robinlovelace Thanks for the comments, they were very helpful.

Statement of need was added. README was shortened and examples moved to a vignette. Installation: what's non-standard about the style? what would you suggest?

r1. I agree. r2. The next step will be to create a package (or new functions in this package) that can achieve the same thing locally. Any suggestions would be very helpful. r3. postcodesio would not be search engine friendly. postcodesior would be better, without the capital cases I used but that would mean renaming not only the package but also some blog posts or links. If it's not a major issue, I'd prefer to stick to the old name. I will use the lowercase for future packages. r4. Can you elaborate? I'm using the basic cases that are allowed/returned by the API. r5. I agree. Data frames would be easier to use. The problem here was that in some cases the lists that are returned had unequal number of elements thus turning them into a data frame was tricky. Will try to find an example. It was more tricky than expected. PR would be welcome if you have a solution. r6. I clarified this example and showed how to turned a data frame into a format required by the function. r7. fixed. @cvitolo suggested the same thing. I've tried it before but I must have done something wrong.

cvitolo commented 6 years ago

@erzk thanks for addressing all my comments. Great job!

Please find below few more suggestions.

  1. I'm not sure how to address this point. The problem here is that in some cases the API returns a structure that can't be turned into a data frame. I need to work on this a bit longer. TODO: df vs. list

Maybe your data.frame needs list-columns?

  1. Additional information about returned data types was added to README. I think it might be too long to describe each data point in each function as they are repeating. Do you have other suggestion?

I know it's a tedious task but a full description of the expected output is needed. If you have multiple functions (e.g. A, B, C) generating the same table, you could add the full description to function A. Function B and C can simply contain a link to the description in function A. Does it make sense?

Alternatively you can add links to the original API documentation. But please try to add them systematically to every help page.

  1. The functions are split into two different calls by postcodes.io. I tried to keep the function as similar to the function calls as possible. The API specifies that bulk works with two or more postcodes. I added this info to documentation.

OK, it makes sense to keep the R functions as similar to the function calls as possible. Do you mind to add a note on this to the package help pages (postcode_lookup and bulk_postcoe_lookup)?

One more thing, at the moment if your postcode input argument for postcode_lookup() is a vector of two postcodes, the function fails:

> postcode_lookup(c("EC1Y8LX", "RG13DQ"))
Error: length(url) == 1 is not TRUE

It would be great to make this error more meaningful. Maybe you could add something like this at the beginnig of postcode_lookup():

if (length(nchar(postcode)) > 1){
    stop("This function accepts only one postcode, for multiple requests please use bulk_postcode_lookup().")
  }

This way, when you run the function with the wrong input, the user gets a clear suggestion to fix the problem:

> postcode_lookup(c("EC1Y8LX", "RG13DQ"))
Error in postcode_lookup(c("EC1Y8LX", "RG13DQ")) : 
  This function accepts one postcode, for multiple postcodes please use bulk_postcode_lookup().
  1. Title fixed. TODO: JSON vs. list

OK, let me know when you fix the JSON vs list bit.

  1. This info is missing from the API documentation. I will contact the authors for more detail.

Perfect, let me know when the info is added to the documentation.

  1. Fixed. Well spotted.

There is still an inconsistency to fix in the documentation: the Value section says the function returns a list, while in reality it returns a data.frame (as also stated in the Description section).

  1. Fixed

OK

  1. Both function seem to be the same. postcode_lookup returns a data frame and postcode_query a list. I don't think there was any particular reason for that. I will check with the API authors. postcode_query call allows limiting the results. The rest seems the same.

I would suggest to add a comment on this in the relevant help pages.

  1. Fixed. Documentation is up to date now.

OK

  1. Limits of 100 are the default in the API calls so I kept it the same in the package.

Is there a way to change the default API limit of 100 calls? If so, is there a way to change that default parameter in your package as well?

  1. GPL3 keeps the modifications free (and open source) so I prefer it to the MIT license.

OK

  1. Fixed. Thanks for the tip. I had the same problem in my other packages.

No problem :)

  1. I haven't changed that as I rely strongly on httr. A helper function in zzz.R is calling three functions from httr and there are additional ones in functions. That's my only dependency so I'm loading it as is. What would be a benefit of importing each function from httr separately?

In my opinion, it is better to 'consciously' import the functions you need because that eliminates the problem of masking functions by mistake.

  1. Fixed

OK

  1. I used goodpractice package which suggested these changes but using vapply() breaks my code and would need extra steps to reshape the output of sapply()

OK

  1. The README files has new information about use cases and explains where the package can be useful. README was trimmed down because @Robinlovelace also suggested it's too long. The examples (except one) were moved to a vignette.

OK

  1. Comments added to function documentation (e.g. postcode_lookup()). Returned values are described in the original documentation. I refer users to it. The description itself is quite long and I don't think it needs to be repeated. Also, see point 2.

That's fine. It would be very convenient to have links to the original documentation in the help pages as well.

Robinlovelace commented 6 years ago

Comments post-review:

r1. I agree. + r2. The next step will be to create a package (or new functions in this package) that can achieve the same thing locally. Any suggestions would be very helpful.

No suggestions at present, other than to download all the boundaries used by the API and use a spatial package such as sf to extract the boundaries for any point. This could link to my prototype https://github.com/Robinlovelace/ukboundaries/ package which I hope (with some encouragement from @karthik and perhaps support from the ONS) to submit to ROpenSci at some point.

r3. postcodesio would not be search engine friendly. postcodesior would be better, without the capital cases I used but that would mean renaming not only the package but also some blog posts or links. If it's not a major issue, I'd prefer to stick to the old name. I will use the lowercase for future packages.

It's not a major issue.

r4. Can you elaborate? I'm using the basic cases that are allowed/returned by the API.

I think this is a major issue. R has some add-on packages that make it intuitive to process, analyse and model spatial data. Premier among these for vector data is sf. If it could take sf objects in and perhaps return sf objects out that would be a great benefit. I suggest taking a look at https://github.com/ropensci/osmdata (and my own stplanr package). These allow inputs and outputs in various spatial formats which is very useful if you work with spatial data which presumably users of PostcodesioR will.

r5. I agree. Data frames would be easier to use. The problem here was that in some cases the lists that are returned had unequal number of elements thus turning them into a data frame was tricky. Will try to find an example. It was more tricky than expected. PR would be welcome if you have a solution.

Like @cvitolo I think list columns or empty columns could save you here. Unlikely to find time to put in a PR to do this with the priority being to complete the https://github.com/Robinlovelace/geocompr book at present.

r6. I clarified this example and showed how to turned a data frame into a format required by the function.

Great work.

r7. fixed. @cvitolo suggested the same thing. I've tried it before but I must have done something wrong.

Awesome.

erzk commented 6 years ago

@cvitolo

  1. Fixed. The column names are not duplicated any more.
  2. Fixed. I added full documentation and URLs to postcode_lookup(), place_lookup(), and terminated_postcode(). They return three different sets of values as described in the API [documentation](#' @seealso \code{\link{place_lookup}} for documentation.). The remaining functions have references to one of these three relevant functions.
  3. Fixed. Extra documentation added. Error handling added.
  4. Fixed. Documentation update. R uses a list. API uses JSON.
  5. Fixed. The author updated the API documentation. So did I. Nearest postcodes/outcodes are based on outcode/postcode centroids.
  6. Fixed. Documentation says 'data frame'.
  7. Fixed. Documentation updated to explain differences between the functions.
  8. Fixed. Where possible @param limit was added to specify the limit.
  9. Fixed. @importFrom httr GET was added.
  10. Fixed. See point 2.
karthik commented 6 years ago

Thanks for the revisions @erzk and for the detailed and thoughtful reviews @cvitolo and @Robinlovelace Can you two look over the latest revisions and sign off if you are satisfied? 🙏

Robinlovelace commented 6 years ago

I'm satisfied with the submission and the updates, great to be part of the peer review process. If I think of any further issues would they be best as issues in the issue tracker or as comments here? (Guess: it depends on whether they are directly related to the review process.) Fantastic work @erzk!

karthik commented 6 years ago

If I think of any further issues would they be best as issues in the issue tracker or as comments here?

It would be best on the issue tracker of the repo itself (the location will change once accepted, but GitHub will redirect appropriately).

cvitolo commented 6 years ago

I'm satisfied with the changes, well done @erzk!

On Mon, 16 Apr 2018, 22:27 Karthik Ram, notifications@github.com wrote:

If I think of any further issues would they be best as issues in the issue tracker or as comments here?

It would be best on the issue tracker of the repo itself (the location will change once accepted, but GitHub will redirect appropriately).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/onboarding/issues/176#issuecomment-381755550, or mute the thread https://github.com/notifications/unsubscribe-auth/AEhdr12paEOboRb3a4ZKcVa4wF_O2V5qks5tpQzMgaJpZM4RRE9J .

karthik commented 6 years ago

Congrats @erzk! your package has been approved! 🎉 Thank you for submitting and @cvitolo and @Robinlovelace for thorough and timely reviews. To-dos:

[![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)

Welcome aboard! We'd also love a blog post about your package, either a short-form intro to it (https://ropensci.org/tech-notes/) or long-form post with more narrative about its development. ((https://ropensci.org/blog/). If you are, @stefaniebutland will be in touch about content and timing.

Final suggestion. If you found feedback from the reviewers valuable, consider adding them as reviewers in the description of your package (this is entirely up to you and not a requirement). If you decide to, you'll need:

erzk commented 6 years ago

@karthik All done. @stefaniebutland I'm happy to write a blog post about the package. @cvitolo @Robinlovelace Thanks for reviewing the code. I added you as reviewers in the DESCRIPTION file.

karthik commented 6 years ago

Thanks @erzk! That was fast! I'll close this issue since the review is complete but you will hear further from Stefanie on the possibility of writing a blog post.

stefaniebutland commented 6 years ago

@erzk Great to hear you're interested in contributing a post about PostcodesioR. You can read example narrative posts about onboarded packages here: https://ropensci.org/tags/onboarding/ and technotes here: https://ropensci.org/technotes/. Technotes can be published at any time, while blog posts are scheduled in advance.

Here are some technical and editorial guidelines: https://github.com/ropensci/roweb2#contributing-a-blog-post.

We ask that you submit a draft via pull request at least one week before the planned publication date. At this point there are several blog posts in the pipeline, so perhaps best for you to suggest a deadline by which you would like to submit a draft.