ropensci / software-review

rOpenSci Software Peer Review.
287 stars 104 forks source link

fingertipsR #168

Closed sebastian-fox closed 6 years ago

sebastian-fox commented 6 years ago

Summary

Package: fingertipsR
Type: Package
Version: 0.1.3.9000
Title: Fingertips Data for Public Health
Description: Fingertips (<http://fingertips.phe.org.uk/>) contains data for many indicators of public health in England. The underlying data is now more easily accessible by making use of the API.
Depends: R (>= 2.10)
Authors@R: c(person("Sebastian", "Fox", , "sebastian.fox@phe.gov.uk", c("aut", "cre")),
    person("Julian", "Flowers", , "julian.flowers@phe.gov.uk", c("aut","ctb")))
URL: https://fingertips.phe.org.uk
BugReports: https://github.com/PublicHealthEngland/fingertipsR/issues
Imports:
    dplyr,
    jsonlite,
    data.table,
    httr,
    purrr,
    DT,
    miniUI,
    shiny,
    shinycssloaders,
    readr
Suggests:
    knitr,
    rmarkdown,
    testthat,
    ggplot2,
    pander,
    curl
VignetteBuilder: knitr
License: GPL-3
Encoding: UTF-8
LazyData: true
RoxygenNote: 6.0.1

https://github.com/PublicHealthEngland/fingertipsR

Data retrieval because it provides users with functions to import data for over 2,000 indicators of public health in England from the Fingertips website.

The target audience is anyone with an interest in public health. This includes researchers, public and private sector workers and the general public. The website gets almost one million hits in a year. The majority of users are in the public sector who need to make decisions that affect the public's health. This website helps inform those decisions. Increasingly, the website is being accessed by academic institutions. The fingertipsR package is already on CRAN and it is being discussed in some institutions as a future resource for teaching materials.

No

Requirements

Confirm each of the following by checking the box. This package:

Publication options

Detail

karthik commented 6 years ago

👋 sebastian-fox Thank you for your submission to rOpenSci's onboarding. I've tagged your submission as a presubmission while I do some preliminary checks before I assign an editor.

noamross commented 6 years ago

Editor checks:


Editor comments

Thank you for your submission, @sebastian-fox ! I see you are running the package on Travis already, but we do require a CI badge in your README as well as a continuous code coverage service (generally codecov or coveralls) and a badge from that, as well, so we can see package status at a glance. I'm seeking reviewers and will assign them once you've made this update.

Here is output from goodpractice::gp(). I see nothing sufficient to hold up assigning reviewers, but 58% test coverage is a bit on the low side, and the rest should be handled by the end of review. Things like line length and whole-package imports are up to the judgement of reviewers and this serves a reference for them and you.

── GP fingertipsR ────────────────────────────────────

It is good practice to

  ✖ write unit tests for all functions, and all package code in general. 58% of code lines
    are covered by test cases.

    R/area_types.R:36:NA
    R/area_types.R:93:NA
    R/area_types.R:94:NA
    R/area_types.R:95:NA
    R/area_types.R:96:NA
    ... and 181 more lines

  ✖ use '<-' for assignment instead of '='. '<-' is the standard, and R users and
    developers are used it and it is easier to read your code for them if you use '<-'.

    R/select_indicators.R:48:33

  ✖ avoid long code lines, it is bad for readability. Also, many people prefer editor
    windows that are about 80 characters wide. Try make your lines shorter than 80 characters

    R/area_types.R:36:1
    R/area_types.R:61:1
    R/area_types.R:66:1
    R/area_types.R:69:1
    R/deprivation_decile.R:38:1
    ... and 103 more lines

  ✖ avoid sapply(), it is not type safe. It might return a vector, or a list, depending on
    the input data. Consider using vapply() instead.

    R/fingertips_data.R:187:33
    R/fingertips_data.R:188:48
    tests/testthat/test-deprivation.R:21:22
    tests/testthat/test-deprivation.R:22:22
    tests/testthat/test-deprivation.R:23:22
    ... and 1 more lines

  ✖ avoid 1:length(...), 1:nrow(...), 1:ncol(...), 1:NROW(...) and 1:NCOL(...)
    expressions. They are error prone and result 1:0 if the expression on the right hand side is
    zero. Use seq_len() or seq_along() instead.

    R/retrieve_data.R:10:19

  ✖ not import packages as a whole, as this can cause name clashes between the imported
    packages. Instead, import only the specific functions you need.
──────────────────────────────────────────────

Reviewers: @carlganz, @nacnudus Due date: 2018-01-15

sebastian-fox commented 6 years ago

Thanks for taking a look at this. I have added the badges as requested. I have also improved the coverage of the package.

I've also updated some of the code based on the goodpractice feedback

noamross commented 6 years ago

Reviewers are @carlganz and @nacnudus. In light of the end-of-year holidays, due date is 2018-01-15.

noamross commented 6 years ago

@sebastian-fox You can add our Peer Review badge to your README. It will read "in-progress" and update after acceptance:

[![](https://badges.ropensci.org/168_status.svg)](https://github.com/ropensci/onboarding/issues/168)
nacnudus commented 6 years ago

Hi @sebastian-fox @noamross, here's my review.

Package Review

By Duncan Garmonsway @nacnudus

Documentation

The package includes all the following forms of documentation:

Functionality

Final approval (post-review)

Estimated hours spent reviewing:


Review Comments

The fingertipsR package provides programmatic access to data published by Public Health England on the Fingertips website. This is useful in its own right, but especially because the downloads offered by the website appear to be exclusively as spreadsheets.

The package has some fairly heavy dependencies (both dplyr and data.table), but makes the most of shiny and related packages to provide a searchable index of datasets.

The vignettes demonstrate common usage and are easy to follow, and data structure is self-explanatory once it has been retrieved.

README

The introductory statement explains that the package is for interacting with a data tool called 'Fingertips'. It would be helpful if it also explained what the Fingertips data tool is.

The first installation instruction requires the devtools package. Please could you also give instructions that only use base R, e.g. https://stackoverflow.com/a/1474125/937932. Please can you also add an instruction for installing from CRAN, and make it clear that there is a difference, e.g. 'stable' vs 'development'.

Please can the example in the README be described in full. The current text is generic:

This is a basic example which shows you how to solve a common problem:

The instruction for browsing vignettes only works if the vignettes have been installed, which is not the default. If using devtools, then the call should be

devtools::install_github("PublicHealthEngland/fingertipsR",
                         build_vignettes = TRUE,
                         dependencies = "suggests")

Life Expectancy vignette

I read this vignette before reading any of the function documentation, so some of my points are answered in the docs (but it would still be helpful not to have to go to the docs for the answers).

The .md file committed to GitHub is out of date.

There seems to be a line missing

We need to begin by calling the fingertipsR package:

Should it be followed by

library(fingertipsR)

?

In the code chunk under 'IndicatorID', the comment beginning # Because the same indicators ... would be easier to read (and render more nicely as html!) if it were wrapped to lines of 80 characters as below. It also mentions 'profiles' without defining what profiles are. That is defined on the Fingertips website, but it would be helpful to repeat some information here for users who have come straight here. They might come here first if they are maintaining code written by others. From the packaging_guide:

  • If your package connects to a data source or online service, or wraps other software, consider that your package README may be the first point of entry for users. It should provide enough information for users to understand the nature of the data, service, or software, and provide links to other relevant data and documentation. For instance, a README should not merely read, "Provides access to GooberDB," but also include, "..., an online repository of Goober sightings in South America. More information about GooberDB, and documentation of database structure and metadata can be found at link.

If filtering the output of indicators() is a common workflow (code chunk below), then it might help users if there was a built-in function to return one record per IndicatorID.

inds <- indicators()
life_expectancy <- inds[grepl("life expectancy", tolower(inds$IndicatorName)),]

# Because the same indicators are used in multiple profiles, there are many
# repeated indicators in this table (some with varying IndicatorName but same
# IndicatorID)

# This returns a record for each IndicatorID
life_expectancy <-
  unique(life_expectancy[duplicated(life_expectancy$IndicatorID) == FALSE,
                        c("IndicatorID", "IndicatorName")])
knitr::kable(life_expectancy, row.names = FALSE)

areaTypes()$AreaTypeID is numeric vector, but only contains integers. Would you consider using an integer vector instead. Similarly areaTypes()$ParentAreaTypeID

Please could you explain the relationship between AreaTypeID and ParentAreaTypeID. It is another thing that is probably explained on the Fingertips website, but that would be convenient to know at this point in the vignette.

Typo: though it change should be though it changes.

The quoted paragraph below begs the question, are there any other defaults, and are they guessable? What's the rationale? Also, what do 101, 102 and 6 translate to?

ParentAreaTypeID is 6 by default for the fingertips_data() function for AreaTypeID of 102 (though it change if different AreaTypeIDs are entered), so we can stick with that in this example.

Typo in the docs for deprivation_decile(). Outputs a data frame allocating deprivation decile to area code base on should be based on.

The deprivation section interrupts the workflow from finding the IDs to downloading the data with fingertips_data(). It would flow better to finish downloading the data before seeking the deprivation deciles.

It also took me a while to understand what deprivation_decile() does. It downloads a dataset that is already available in Fingertips the usual way, filters it for the given IDs, renames a column nd recalculaties the decile column. It would be helpful to explain somewhere why the deprivation decile data gets this special treatment.

Should the error message below be "Year must be 2010, 2011, 2012 or 2015"?

    if (!(Year %in% c(2010, 2011, 2012, 2015))) {
        stop("Year must be either 2010 or 2015")
    }

The style of the following if statement could be improved by simplifying.

    if (!(Year %in% c(2010, 2011, 2012, 2015))) {
        stop("Year must be either 2010 or 2015")
    }
    else if (Year %in% c(2010, 2011, 2012)) {
        IndicatorID <- 338
    }
    else if (Year == 2015) {
        IndicatorID <- 91872
    }
    if (AreaTypeID == 101) {
        if (Year %in% c(2010, 2015)) {
            AreaFilter <- "District & UA"
        }
        else if (Year %in% c(2011, 2012)) {
            stop("Year must be either 2010 or 2015 for AreaTypeID of 101 or 102")
        }
    }
    else if (AreaTypeID == 102) {
        if (Year %in% c(2010, 2015)) {
            AreaFilter <- "County & UA"
        }
        else if (Year %in% c(2011, 2012)) {
            stop("Year must be either 2010 or 2015 for AreaTypeID of 101 or 102")
        }
    }
    else if (AreaTypeID == 7) {
        AreaFilter <- "GP"
    }
    else {
        stop("AreaTypeID must be either 101 (Local authority districts and Unitary Authorities), 102 (Counties and Unitary Authorities) or 7 (General Practice).")
    }

Here is a shorter version that I find easier to follow.

    if (!(Year %in% c(2010, 2011, 2012, 2015))) {
        stop("Year must be either 2010 or 2015")
    }
    if (!(AreaTypeID %in% c(101, 102, 7))) {
        stop("AreaTypeID must be either 101 (Local authority districts and Unitary Authorities), 102 (Counties and Unitary Authorities) or 7 (General Practice).")
    }
    if ((AreaTypeID %in% c(101, 102)) && !(Year %in% c(2010, 2015))) {
        stop("Year must be either 2010 or 2015 for AreaTypeID of 101 or 102")
    }
    if (Year %in% c(2010, 2011, 2012)) IndicatorID <- 338
    if (Year == 2015) IndicatorID <- 91872
    if (AreaTypeID == 101) AreaFilter <- "District & UA"
    if (AreaTypeID == 102) AreaFilter <- "County & UA"
    if (AreaTypeID == 7) AreaFilter <- "GP"

The output of pander::pandoc.table() doesn't work in the HTML rendering of the vignette. It gives the following output, which is markdown syntax but in a code block instead of being rendered as HTML.

##
##
## |  &nbsp;  | IndicatorID |              IndicatorName              | ParentCode |
## |:--------:|:-----------:|:---------------------------------------:|:----------:|
## | **5827** |    90362    | 0.1i - Healthy life expectancy at birth | E12000005  |
## | **5828** |    90362    | 0.1i - Healthy life expectancy at birth | E12000006  |
## | **5829** |    90362    | 0.1i - Healthy life expectancy at birth | E12000008  |
## | **5830** |    90362    | 0.1i - Healthy life expectancy at birth | E12000005  |
## | **5831** |    90362    | 0.1i - Healthy life expectancy at birth | E12000008  |
## | **5832** |    90362    | 0.1i - Healthy life expectancy at birth | E12000005  |
...

Interactively selecting indicators vignette

Does the word 'significantly' in this sentence imply that a statistical significance test has been applied? If not then less loaded term may be better, e.g. 'notably', or an explanation of how the indicators are chosen.

highlighting the areas for each indicator that are deteriorating and are already significantly worse than the England (or parent) value

library(fingertipsR)
library(ggplot2)
inds <- select_indicators()

Have you considered wrapping the geom_tile() plot as a built-in plot function? If so, it would be worth refining the design. Points might be easier to read than tiles.

Function documentation

In general it would be helpful to make it clear that the named government departments are in the UK.

area_types
category_types
deprivation_decile
fingertips_data
fingertips_redred
indicator_areatypes
indicator_metadata
indicators
profiles
select_indicators

area_types()

Needs an example that uses the second argument AreaTypeID. This would be a good place to explain how area types relate to parent area types, or at least link to an explanation.

category_types

Please could it return its result visibly (i.e. category_types() in the console should print its output to the console).

Should have an example, even if it seems trivial.

deprivation_decile

The restriction on the Year argument to the years 2010 and 2015 isn't reflected in the code, which also accepts 2011 and 2012.

It would be helpful to add a Details section to explain what this function actually does (downloads a dataset the usual way, filters it, renames and recalculates a couple of columns). Presumably there is a good reason not to just download the deprivation data oneself?

fingertips_data

The different IDs are a bit overwhelming. Is there a way to explain how they relate to one another? Perhaps a drawing would help.

Would you reconsider the stringsAsFactors argument? I am not sure that it is useful for strings to be converted to factors in this case. If it is useful, then it would be more useful (and less dangerous) if they were converted to a standard sets of factor levels that were guaranteed to included levels that might be missing from the data. For example, if an area were missing from a particular dataset, it should still have a factor level, so that the data could safely be joined to another dataset of the same areas.

What do the defaults 6, 101 and 102 translate to?

Typo in the examples: trailing '#.

     fingdata <- fingertips_data(DomainID = doms)#'

The matter of polarity could be explained more. I think it's something to do with whether a bigger number is better or worse, but I don't see why that leads to duplicates in the example.

fingertips_redred

Does the word 'significantly' imply a statistical test is performed? If not, a less loaded term would be better, e.g. 'notably', or 'quite a bit'.

If Comparator = "Parent", does the spot test still relate to "England" as per the text?

The @return heading is not being rendered correctly.

indicator_areatypes

This function loads a local dataset. Is it not available from the API? Otherwise it would make more sense (to me!) to explort the dataset than to wrap it in this function. Though I can see that it is consistent with the other functions that return datasets (and that accept arguments).

indicator_metadata

Good.

indicators

Is the title correct? "Profiles".

profiles

Good.

select_indicators

Nice function :)

Community guidelines / Code of Conduct

Please could you add guidelines or a code of conduct in the README or a CONTRIBUTING file or a CODE_OF_CONDUCT.md file, and a Maintainer: tag in the DESCRIPTION file. See the packaging_guide.

We recommend that you use a code of conduct such as the Contributor Covenant in developing your package. You can document your code of conduct in a CODE_OF_CONDUCT.md or CONDUCT.md file in the package root directory, and linking to this file from the README.md file. devtools::use_code_of_conduct() will add the Contributor Covenant template to your package.

NEWS

Please follow the packaging guide advice on putting CRAN release dates in the NEWS file, and tagging the release on GitHub.

DESCRIPTION

If the UK Government owns the copyright on this package then that should be mentioned in the DESCRIPTION.

Tests

Please could more tests be added? Some lines of code are untested so far, as the following code will show. This is fair enough for the shiny function select_indicators(), but would be straightforward to hit 100% of the other functions. The select_indicators() function can be excluded from coverage statistics by using the comments # nocov start and # nocov end.

x <- covr::package_coverage()
covr::zero_coverage(x)

goodpractice

There are a lot of long lines, and I am one of those people who likes to fit other windows on the screen besides the editor ;) Sometimes it's unavoiadable because of the API calls, but in most cases function arguments could flow onto another line, or inline sequences of piped functions could be wrapped in their own function.

General comments

It would be well worth asking a copy editor, or someone of that ilk, to edit the docs and vignettes. It just needs a bit of tidying up to make it easier to read, e.g. changing Other data extract functions to Other data extraction functions.

fingertips_redred(), indicator_metadata() return tibbles, which is very nice. Would you consider also returning tibbles from the other functions? area_types(), category_types(), indicator_areatypes(), deprivation_decile(), fingertips_data(), areatypes_by_indicators, indicators() and profiles()?

area_types and areatypes are both used in function names. Maybe not worth changing now because backwards compatibility is important and the package is already widely used.

Please can you remove the commented-out code in retrieve_data.R.

Not a question about the package, but for Public Health England -- why isn't http://fingertips.phe.org.uk/ on a gov.uk domain?

I hope this review is helpful. Please ask any questions you have.

Time spent reviewing

6 hours

carlganz commented 6 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Functionality

Final approval (post-review)

Estimated hours spent reviewing: 5


Review Comments

The fingerTipsR package does an excellent job providing easy access to a substantial amount of data. It even provides a nifty Shiny gadget for interactive selection of data. I think there are a few areas where the package can be improved, but overall the package is high quality.

Documentation

The readme is a little sparse. Please add a statement of need. A more detailed description of what the Fingertips data source is would be nice. How is the data collected? What is scope of dataset? Ideally, a person who has never heard of the Fingertips dataset should be able to read the readme and then have a basic understanding of what the dataset is.

Vignettes do a good job explaining how to use the package. Echoing @nacnudus flipping the "Deprivation" and "Extracting the data" sections from the "Life expectancy by deprivation" vignette would make the document flow better.

The indicator2 function has no documentation.

category_types and indicator_areatypes do not have examples.

For functions that take a long time to run, it would be wise to add a note to documentation saying so.

Functionality

The Fingertips dataset is pretty substantial. This makes certain functionality pretty slow. There are a few ways the code could be optimized. These are simply suggestions on how to improve the code. I will still recommend approving this package even if you don't incorporate these changes.

data.table::rbindlist is functionally equivalent to dplyr::bind_rows. Since dplyr is already imported may as well use bind_rows

This is also functionally equivalent to dplyr::bind_rows, but is much slower. This would also remove the purrr dependency, but I'm about to recommend using purrr for something else.

@noamross has a good write up about why this idiom is memory inefficient in section 4 here

Here is one example in the package: https://github.com/PublicHealthEngland/fingertipsR/blob/master/R/indicators.R#L42

There are a few different ways of dealing with this. The purrr based solution would look something like this:

df <- DomainId %>% map_dfr(function(dom) {
          dfRaw <- paste0(path,"indicator_metadata/by_group_id?group_ids=",dom) %>%
                        GET %>%
                        content("text") %>%
                        fromJSON(flatten = TRUE)
                if (length(dfRaw) != 0){
                        dfRaw <- unlist(dfRaw, recursive = FALSE)
                        dfIDs <- dfRaw[grepl("IID", names(dfRaw))]
                        names(dfIDs) <- gsub(".IID","",names(dfIDs))
                        dfDescription <- unlist(dfRaw[grepl("Descriptive", names(dfRaw))],
                                                recursive = FALSE)
                        dfDescription <- dfDescription[grepl("NameLong", names(dfDescription))]
                        names(dfDescription) <- gsub(".Descriptive.NameLong","",names(dfDescription))
                        commonNames <- intersect(names(dfIDs), names(dfDescription))
                        dfIDs <- dfIDs[commonNames]
                        dfDescription <- dfDescription[commonNames]

                        data.frame(IndicatorID = unlist(dfIDs),
                                         IndicatorName = unlist(dfDescription),
                                         DomainID = dom,
                                         row.names=NULL)
                }
})

This could also be done with lapply.

Doesn't return anything.

Hopefully this review was helpful.

Kind Regards,

Carl Ganz

noamross commented 6 years ago

Thank you for your thorough reviews, @nacnudus and @carlganz! @sebastian-fox, at this point we generally aim for a response to reviewers within two weeks. This can be fully implementing recommended changes or just a written response of the changes you plan to make and a time frame (example).

I would re-emphasize @nacnudus's comment on thinking about the README and other documentation as first points of entry for users. We try to follow this mentality throughout package documentation. I also strongly suggest you follow @carlganz's recommendations on performance and reducing dependencies.

sebastian-fox commented 6 years ago

Hi @noamross , @carlganz and @nacnudus ,

Firstly, a massive thank you for taking your time to look at this package. I can see you have given it an extremely thorough review and I've learnt a lot from your comments, so thank you.

I'll respond to question first, then what I have/haven't addressed:

Question

Why isn't Fingertips on a .gov.uk website? I'm afraid I don't know the answer (I'm not in any of the teams that would be involved in those discussions). My guess is that it is a legacy issue. PHE was formed relatively recently and Fingertips was popular before then, so there are probably challenges to move it across.

Addressed

General
README
DESCRIPTION
NEWS
LE Vignette
Interactively selecting indicators vignette
area_types
category_types
deprivation_decile
fingertips_data
fingertips_redred
indicator_areatypes
indicators
select_indicators
indicator2
sebastian-fox commented 6 years ago

I should have mentioned - all the changes have been made to v0.1.3.9000. I should submit the package to CRAN again soon because of the change in the indicator_areatypes() function now pulling data from the API. I'll wait for any further feedback from you before doing so.

nacnudus commented 6 years ago

@sebastian-fox thanks for making all those changes -- that's a lot of work! There are a couple of small leftovers, but in general I think this is ready to go now.

sebastian-fox commented 6 years ago

Hi @nacnudus ,

Thanks for the further comments - all really helpful again. I've made all the changes now:

And this time I've checked that the package builds.

Thanks again

noamross commented 6 years ago

Thanks for your revisions and response, @sebastian-fox and @nacnudus. @carlganz, let us know whether the changes address your comments.

nacnudus commented 6 years ago

Thanks @sebastian-fox, that's everything from me.

Final approval (post-review)

The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

carlganz commented 6 years ago

Final approval (post-review) The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Congrats @sebastian-fox

noamross commented 6 years ago

Approved! 🎉 Thank you for submitting and @nacnudus and @carlganz for thorough and timely reviews. To-dos:

[![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)

Welcome aboard! We'd also love a blog post about your package, either a short-form intro to it (https://ropensci.org/tech-notes/) or long-form post with more narrative about its development. ((https://ropensci.org/blog/). If you are, @stefaniebutland will be in touch about content and timing.

stefaniebutland commented 6 years ago

Hello @sebastian-fox. I'm rOpenSci's Community Manager. Happy to answer any questions you might have about contributing a post for the rOpenSci blog.

sebastian-fox commented 6 years ago

Thanks all - I can see a lot of work goes into this from your end so this is really appreciated.

I will transfer the package over as soon as I can. I'm don't have admin access currently to my repo so I can't see the settings option. I need to negotiate admin rights, so hopefully that won't take too long.

Thanks again

sebastian-fox commented 6 years ago

Hi @noamross - I've completed your checklist. Thanks for your help. I'll start thinking about blogs :)

noamross commented 6 years ago

Great! Welcome aboard! I will close this issue, you'll find instructions on submitting a blog post over at https://github.com/ropensci/roweb2#contributing-a-blog-post .

sebastian-fox commented 6 years ago

Hello, just wondering how to get the rOpenSci badge status updated from "Under review" to "peer reviewed" for fingertipsR on the README file?

noamross commented 6 years ago

We see it as "Peer Reviewed". You may need to clear your browser cache.

sebastian-fox commented 6 years ago

Thanks - that did the trick :)