Closed sebastian-fox closed 6 years ago
👋 sebastian-fox
Thank you for your submission to rOpenSci's onboarding. I've tagged your submission as a presubmission
while I do some preliminary checks before I assign an editor.
Thank you for your submission, @sebastian-fox ! I see you are running the package on Travis already, but we do require a CI badge in your README as well as a continuous code coverage service (generally codecov or coveralls) and a badge from that, as well, so we can see package status at a glance. I'm seeking reviewers and will assign them once you've made this update.
Here is output from goodpractice::gp()
. I see nothing sufficient to hold up assigning reviewers, but 58% test coverage is a bit on the low side, and the rest should be handled by the end of review. Things like line length and whole-package imports are up to the judgement of reviewers and this serves a reference for them and you.
── GP fingertipsR ────────────────────────────────────
It is good practice to
✖ write unit tests for all functions, and all package code in general. 58% of code lines
are covered by test cases.
R/area_types.R:36:NA
R/area_types.R:93:NA
R/area_types.R:94:NA
R/area_types.R:95:NA
R/area_types.R:96:NA
... and 181 more lines
✖ use '<-' for assignment instead of '='. '<-' is the standard, and R users and
developers are used it and it is easier to read your code for them if you use '<-'.
R/select_indicators.R:48:33
✖ avoid long code lines, it is bad for readability. Also, many people prefer editor
windows that are about 80 characters wide. Try make your lines shorter than 80 characters
R/area_types.R:36:1
R/area_types.R:61:1
R/area_types.R:66:1
R/area_types.R:69:1
R/deprivation_decile.R:38:1
... and 103 more lines
✖ avoid sapply(), it is not type safe. It might return a vector, or a list, depending on
the input data. Consider using vapply() instead.
R/fingertips_data.R:187:33
R/fingertips_data.R:188:48
tests/testthat/test-deprivation.R:21:22
tests/testthat/test-deprivation.R:22:22
tests/testthat/test-deprivation.R:23:22
... and 1 more lines
✖ avoid 1:length(...), 1:nrow(...), 1:ncol(...), 1:NROW(...) and 1:NCOL(...)
expressions. They are error prone and result 1:0 if the expression on the right hand side is
zero. Use seq_len() or seq_along() instead.
R/retrieve_data.R:10:19
✖ not import packages as a whole, as this can cause name clashes between the imported
packages. Instead, import only the specific functions you need.
──────────────────────────────────────────────
Reviewers: @carlganz, @nacnudus Due date: 2018-01-15
Thanks for taking a look at this. I have added the badges as requested. I have also improved the coverage of the package.
I've also updated some of the code based on the goodpractice feedback
Reviewers are @carlganz and @nacnudus. In light of the end-of-year holidays, due date is 2018-01-15.
@sebastian-fox You can add our Peer Review badge to your README. It will read "in-progress" and update after acceptance:
[![](https://badges.ropensci.org/168_status.svg)](https://github.com/ropensci/onboarding/issues/168)
Hi @sebastian-fox @noamross, here's my review.
By Duncan Garmonsway @nacnudus
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).Estimated hours spent reviewing:
The fingertipsR package provides programmatic access to data published by Public Health England on the Fingertips website. This is useful in its own right, but especially because the downloads offered by the website appear to be exclusively as spreadsheets.
The package has some fairly heavy dependencies (both dplyr and data.table), but makes the most of shiny and related packages to provide a searchable index of datasets.
The vignettes demonstrate common usage and are easy to follow, and data structure is self-explanatory once it has been retrieved.
The introductory statement explains that the package is for interacting with a data tool called 'Fingertips'. It would be helpful if it also explained what the Fingertips data tool is.
The first installation instruction requires the devtools package. Please could you also give instructions that only use base R, e.g. https://stackoverflow.com/a/1474125/937932. Please can you also add an instruction for installing from CRAN, and make it clear that there is a difference, e.g. 'stable' vs 'development'.
Please can the example in the README be described in full. The current text is generic:
This is a basic example which shows you how to solve a common problem:
The instruction for browsing vignettes only works if the vignettes have been installed, which is not the default. If using devtools, then the call should be
devtools::install_github("PublicHealthEngland/fingertipsR",
build_vignettes = TRUE,
dependencies = "suggests")
I read this vignette before reading any of the function documentation, so some of my points are answered in the docs (but it would still be helpful not to have to go to the docs for the answers).
The .md
file committed to GitHub is out of date.
There seems to be a line missing
We need to begin by calling the fingertipsR package:
Should it be followed by
library(fingertipsR)
?
In the code chunk under 'IndicatorID', the comment beginning # Because the same indicators ...
would be easier to read (and render more nicely as html!) if it were wrapped to lines of 80 characters as below. It also mentions 'profiles' without defining what profiles are. That is defined on the Fingertips website, but it would be helpful to repeat some information here for users who have come straight here. They might come here first if they are maintaining code written by others. From the packaging_guide:
- If your package connects to a data source or online service, or wraps other software, consider that your package README may be the first point of entry for users. It should provide enough information for users to understand the nature of the data, service, or software, and provide links to other relevant data and documentation. For instance, a README should not merely read, "Provides access to GooberDB," but also include, "..., an online repository of Goober sightings in South America. More information about GooberDB, and documentation of database structure and metadata can be found at link.
If filtering the output of indicators()
is a common workflow (code chunk below), then it might help users if there was a built-in function to return one record per IndicatorID
.
inds <- indicators()
life_expectancy <- inds[grepl("life expectancy", tolower(inds$IndicatorName)),]
# Because the same indicators are used in multiple profiles, there are many
# repeated indicators in this table (some with varying IndicatorName but same
# IndicatorID)
# This returns a record for each IndicatorID
life_expectancy <-
unique(life_expectancy[duplicated(life_expectancy$IndicatorID) == FALSE,
c("IndicatorID", "IndicatorName")])
knitr::kable(life_expectancy, row.names = FALSE)
areaTypes()$AreaTypeID
is numeric vector, but only contains integers. Would you consider using an integer vector instead. Similarly areaTypes()$ParentAreaTypeID
Please could you explain the relationship between AreaTypeID
and ParentAreaTypeID
. It is another thing that is probably explained on the Fingertips website, but that would be convenient to know at this point in the vignette.
Typo: though it change
should be though it changes
.
The quoted paragraph below begs the question, are there any other defaults, and are they guessable? What's the rationale? Also, what do 101
, 102
and 6
translate to?
ParentAreaTypeID is 6 by default for the fingertips_data() function for AreaTypeID of 102 (though it change if different AreaTypeIDs are entered), so we can stick with that in this example.
Typo in the docs for deprivation_decile()
. Outputs a data frame allocating deprivation decile to area code base on
should be based on
.
The deprivation section interrupts the workflow from finding the IDs to downloading the data with fingertips_data()
. It would flow better to finish downloading the data before seeking the deprivation deciles.
It also took me a while to understand what deprivation_decile()
does. It downloads a dataset that is already available in Fingertips the usual way, filters it for the given IDs, renames a column nd recalculaties the decile column. It would be helpful to explain somewhere why the deprivation decile data gets this special treatment.
Should the error message below be "Year must be 2010, 2011, 2012 or 2015"
?
if (!(Year %in% c(2010, 2011, 2012, 2015))) {
stop("Year must be either 2010 or 2015")
}
The style of the following if
statement could be improved by simplifying.
if (!(Year %in% c(2010, 2011, 2012, 2015))) {
stop("Year must be either 2010 or 2015")
}
else if (Year %in% c(2010, 2011, 2012)) {
IndicatorID <- 338
}
else if (Year == 2015) {
IndicatorID <- 91872
}
if (AreaTypeID == 101) {
if (Year %in% c(2010, 2015)) {
AreaFilter <- "District & UA"
}
else if (Year %in% c(2011, 2012)) {
stop("Year must be either 2010 or 2015 for AreaTypeID of 101 or 102")
}
}
else if (AreaTypeID == 102) {
if (Year %in% c(2010, 2015)) {
AreaFilter <- "County & UA"
}
else if (Year %in% c(2011, 2012)) {
stop("Year must be either 2010 or 2015 for AreaTypeID of 101 or 102")
}
}
else if (AreaTypeID == 7) {
AreaFilter <- "GP"
}
else {
stop("AreaTypeID must be either 101 (Local authority districts and Unitary Authorities), 102 (Counties and Unitary Authorities) or 7 (General Practice).")
}
Here is a shorter version that I find easier to follow.
if (!(Year %in% c(2010, 2011, 2012, 2015))) {
stop("Year must be either 2010 or 2015")
}
if (!(AreaTypeID %in% c(101, 102, 7))) {
stop("AreaTypeID must be either 101 (Local authority districts and Unitary Authorities), 102 (Counties and Unitary Authorities) or 7 (General Practice).")
}
if ((AreaTypeID %in% c(101, 102)) && !(Year %in% c(2010, 2015))) {
stop("Year must be either 2010 or 2015 for AreaTypeID of 101 or 102")
}
if (Year %in% c(2010, 2011, 2012)) IndicatorID <- 338
if (Year == 2015) IndicatorID <- 91872
if (AreaTypeID == 101) AreaFilter <- "District & UA"
if (AreaTypeID == 102) AreaFilter <- "County & UA"
if (AreaTypeID == 7) AreaFilter <- "GP"
The output of pander::pandoc.table()
doesn't work in the HTML rendering of the vignette. It gives the following output, which is markdown syntax but in a code block instead of being rendered as HTML.
##
##
## | | IndicatorID | IndicatorName | ParentCode |
## |:--------:|:-----------:|:---------------------------------------:|:----------:|
## | **5827** | 90362 | 0.1i - Healthy life expectancy at birth | E12000005 |
## | **5828** | 90362 | 0.1i - Healthy life expectancy at birth | E12000006 |
## | **5829** | 90362 | 0.1i - Healthy life expectancy at birth | E12000008 |
## | **5830** | 90362 | 0.1i - Healthy life expectancy at birth | E12000005 |
## | **5831** | 90362 | 0.1i - Healthy life expectancy at birth | E12000008 |
## | **5832** | 90362 | 0.1i - Healthy life expectancy at birth | E12000005 |
...
Does the word 'significantly' in this sentence imply that a statistical significance test has been applied? If not then less loaded term may be better, e.g. 'notably', or an explanation of how the indicators are chosen.
highlighting the areas for each indicator that are deteriorating and are already significantly worse than the England (or parent) value
library(fingertipsR)
library(ggplot2)
inds <- select_indicators()
Have you considered wrapping the geom_tile()
plot as a built-in plot function? If so, it would be worth refining the design. Points might be easier to read than tiles.
In general it would be helpful to make it clear that the named government departments are in the UK.
area_types
category_types
deprivation_decile
fingertips_data
fingertips_redred
indicator_areatypes
indicator_metadata
indicators
profiles
select_indicators
area_types()
Needs an example that uses the second argument AreaTypeID
. This would be a good place to explain how area types relate to parent area types, or at least link to an explanation.
category_types
Please could it return its result visibly (i.e. category_types()
in the console should print its output to the console).
Should have an example, even if it seems trivial.
deprivation_decile
The restriction on the Year
argument to the years 2010 and 2015 isn't reflected in the code, which also accepts 2011 and 2012.
It would be helpful to add a Details
section to explain what this function actually does (downloads a dataset the usual way, filters it, renames and recalculates a couple of columns). Presumably there is a good reason not to just download the deprivation data oneself?
fingertips_data
The different IDs are a bit overwhelming. Is there a way to explain how they relate to one another? Perhaps a drawing would help.
Would you reconsider the stringsAsFactors
argument? I am not sure that it is useful for strings to be converted to factors in this case. If it is useful, then it would be more useful (and less dangerous) if they were converted to a standard sets of factor levels that were guaranteed to included levels that might be missing from the data. For example, if an area were missing from a particular dataset, it should still have a factor level, so that the data could safely be joined to another dataset of the same areas.
What do the defaults 6
, 101
and 102
translate to?
Typo in the examples: trailing '#
.
fingdata <- fingertips_data(DomainID = doms)#'
The matter of polarity could be explained more. I think it's something to do with whether a bigger number is better or worse, but I don't see why that leads to duplicates in the example.
fingertips_redred
Does the word 'significantly' imply a statistical test is performed? If not, a less loaded term would be better, e.g. 'notably', or 'quite a bit'.
If Comparator = "Parent"
, does the spot test still relate to "England"
as per the text?
The @return
heading is not being rendered correctly.
indicator_areatypes
This function loads a local dataset. Is it not available from the API? Otherwise it would make more sense (to me!) to explort the dataset than to wrap it in this function. Though I can see that it is consistent with the other functions that return datasets (and that accept arguments).
indicator_metadata
Good.
indicators
Is the title correct? "Profiles".
profiles
Good.
select_indicators
Nice function :)
Please could you add guidelines or a code of conduct in the README or a CONTRIBUTING file or a CODE_OF_CONDUCT.md file, and a Maintainer:
tag in the DESCRIPTION file. See the
packaging_guide.
We recommend that you use a code of conduct such as the Contributor Covenant in developing your package. You can document your code of conduct in a CODE_OF_CONDUCT.md or CONDUCT.md file in the package root directory, and linking to this file from the README.md file. devtools::use_code_of_conduct() will add the Contributor Covenant template to your package.
Please follow the packaging guide advice on putting CRAN release dates in the NEWS file, and tagging the release on GitHub.
If the UK Government owns the copyright on this package then that should be mentioned in the DESCRIPTION.
Please could more tests be added? Some lines of code are untested so far, as the following code will show. This is fair enough for the shiny function select_indicators()
, but would be straightforward to hit 100% of the other functions. The select_indicators()
function can be excluded from coverage statistics by using the comments # nocov start
and # nocov end
.
x <- covr::package_coverage()
covr::zero_coverage(x)
There are a lot of long lines, and I am one of those people who likes to fit other windows on the screen besides the editor ;) Sometimes it's unavoiadable because of the API calls, but in most cases function arguments could flow onto another line, or inline sequences of piped functions could be wrapped in their own function.
It would be well worth asking a copy editor, or someone of that ilk, to edit the docs and vignettes. It just needs a bit of tidying up to make it easier to read, e.g. changing Other data extract functions
to Other data extraction functions
.
fingertips_redred()
, indicator_metadata()
return tibbles, which is very nice. Would you consider also returning tibbles from the other functions? area_types()
, category_types()
, indicator_areatypes()
, deprivation_decile()
, fingertips_data()
, areatypes_by_indicators
, indicators()
and profiles()
?
area_types
and areatypes
are both used in function names. Maybe not worth changing now because backwards compatibility is important and the package is already widely used.
Please can you remove the commented-out code in retrieve_data.R
.
Not a question about the package, but for Public Health England -- why isn't http://fingertips.phe.org.uk/ on a gov.uk domain?
I hope this review is helpful. Please ask any questions you have.
6 hours
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
devtools::install_github
should probably include build vignette option.indicators2
. What is this function?select_indicator
should probably have a note about the long loading timecategory_types
, indicator_areatypes
; also category_types
doesn't workURL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).
Estimated hours spent reviewing: 5
The fingerTipsR
package does an excellent job providing easy access to a substantial amount of data. It even provides a nifty Shiny gadget for interactive selection of data. I think there are a few areas where the package can be improved, but overall the package is high quality.
The readme is a little sparse. Please add a statement of need. A more detailed description of what the Fingertips data source is would be nice. How is the data collected? What is scope of dataset? Ideally, a person who has never heard of the Fingertips dataset should be able to read the readme and then have a basic understanding of what the dataset is.
Vignettes do a good job explaining how to use the package. Echoing @nacnudus flipping the "Deprivation" and "Extracting the data" sections from the "Life expectancy by deprivation" vignette would make the document flow better.
The indicator2
function has no documentation.
category_types
and indicator_areatypes
do not have examples.
For functions that take a long time to run, it would be wise to add a note to documentation saying so.
The Fingertips dataset is pretty substantial. This makes certain functionality pretty slow. There are a few ways the code could be optimized. These are simply suggestions on how to improve the code. I will still recommend approving this package even if you don't incorporate these changes.
data.table
dependencydata.table::rbindlist
is functionally equivalent to dplyr::bind_rows
. Since dplyr
is already imported may as well use bind_rows
purrr::map_df(rbind)
This is also functionally equivalent to dplyr::bind_rows
, but is much slower. This would also remove the purrr
dependency, but I'm about to recommend using purrr
for something else.
for
loop@noamross has a good write up about why this idiom is memory inefficient in section 4 here
Here is one example in the package: https://github.com/PublicHealthEngland/fingertipsR/blob/master/R/indicators.R#L42
There are a few different ways of dealing with this. The purrr
based solution would look something like this:
df <- DomainId %>% map_dfr(function(dom) {
dfRaw <- paste0(path,"indicator_metadata/by_group_id?group_ids=",dom) %>%
GET %>%
content("text") %>%
fromJSON(flatten = TRUE)
if (length(dfRaw) != 0){
dfRaw <- unlist(dfRaw, recursive = FALSE)
dfIDs <- dfRaw[grepl("IID", names(dfRaw))]
names(dfIDs) <- gsub(".IID","",names(dfIDs))
dfDescription <- unlist(dfRaw[grepl("Descriptive", names(dfRaw))],
recursive = FALSE)
dfDescription <- dfDescription[grepl("NameLong", names(dfDescription))]
names(dfDescription) <- gsub(".Descriptive.NameLong","",names(dfDescription))
commonNames <- intersect(names(dfIDs), names(dfDescription))
dfIDs <- dfIDs[commonNames]
dfDescription <- dfDescription[commonNames]
data.frame(IndicatorID = unlist(dfIDs),
IndicatorName = unlist(dfDescription),
DomainID = dom,
row.names=NULL)
}
})
This could also be done with lapply
.
category_types
Doesn't return anything.
Hopefully this review was helpful.
Kind Regards,
Carl Ganz
Thank you for your thorough reviews, @nacnudus and @carlganz! @sebastian-fox, at this point we generally aim for a response to reviewers within two weeks. This can be fully implementing recommended changes or just a written response of the changes you plan to make and a time frame (example).
I would re-emphasize @nacnudus's comment on thinking about the README and other documentation as first points of entry for users. We try to follow this mentality throughout package documentation. I also strongly suggest you follow @carlganz's recommendations on performance and reducing dependencies.
Hi @noamross , @carlganz and @nacnudus ,
Firstly, a massive thank you for taking your time to look at this package. I can see you have given it an extremely thorough review and I've learnt a lot from your comments, so thank you.
I'll respond to question first, then what I have/haven't addressed:
Why isn't Fingertips on a .gov.uk website? I'm afraid I don't know the answer (I'm not in any of the teams that would be involved in those discussions). My guess is that it is a legacy issue. PHE was formed relatively recently and Fingertips was popular before then, so there are probably challenges to move it across.
purrr
and data.table
(thanks for the bind_rows
tip - I've not used that before)lapply
functions except in retrieve_data.R
file as that's a little more complex (in my eyes at least) as I don't currently have the time. I've added it as an issuecovr
instructions @nacnudus provided, but I'm sure I am testing it!)Maintainer:
addedpander::pandoc.table()
. I chose this because I felt this was the clearest way of displaying this dataset with so many fields. I wanted to avoid left-right scrolling, but that was just my style. Any other suggestions are welcomed thoughgeom_tile
changed to geom_point
- nice suggestion. I'm not keen to provide a function for that chart at the moment. Perhaps if the package evolves to start displaying standard chartsAreaTypeID
example6
and 101
there to explain them - apologies if I've overlooked themstringsAsFactors
removed@return
renders correctly nowI should have mentioned - all the changes have been made to v0.1.3.9000. I should submit the package to CRAN again soon because of the change in the indicator_areatypes()
function now pulling data from the API. I'll wait for any further feedback from you before doing so.
@sebastian-fox thanks for making all those changes -- that's a lot of work! There are a couple of small leftovers, but in general I think this is ready to go now.
R/fingertips_data.R
) is one that I don't think will ever be executed, because an earlier, equivalent test in the same function will fail first. It could be tidied up, but I don't think it's a blocker for publication.accesibility
should be accessibility
.```{r CRAN-install, eval=FALSE}
install.packages("fingertipsR")
deprivation_decile()
the test tests/testthat/test-deprivation.R
needs updating too, and then the build will succeed. Line 32 should be
expect_error(deprivation_decile(Year = 2014), "Year must be either 2010, 2011, 2012 or 2015")
101
, 102
and 7
defaults have now been explained (perhaps 6
was changed
to 7
?). Anyway, it makes much more sense to me now, thanks.Hi @nacnudus ,
Thanks for the further comments - all really helpful again. I've made all the changes now:
fingertips_data()
as it was being tested on line 91deprivation_decile()
testAnd this time I've checked that the package builds.
Thanks again
Thanks for your revisions and response, @sebastian-fox and @nacnudus. @carlganz, let us know whether the changes address your comments.
Thanks @sebastian-fox, that's everything from me.
The author has responded to my review and made changes to my satisfaction. I recommend approving this package.
Final approval (post-review) The author has responded to my review and made changes to my satisfaction. I recommend approving this package.
Congrats @sebastian-fox
Approved! 🎉 Thank you for submitting and @nacnudus and @carlganz for thorough and timely reviews. To-dos:
[x] Transfer the repo to the rOpenSci organization under "Settings" in your repo. I have invited you to a team that should allow you to do so. You'll be made admin once you do.
[x] Add the rOpenSci footer to the bottom of your README
[![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)
Welcome aboard! We'd also love a blog post about your package, either a short-form intro to it (https://ropensci.org/tech-notes/) or long-form post with more narrative about its development. ((https://ropensci.org/blog/). If you are, @stefaniebutland will be in touch about content and timing.
Hello @sebastian-fox. I'm rOpenSci's Community Manager. Happy to answer any questions you might have about contributing a post for the rOpenSci blog.
Thanks all - I can see a lot of work goes into this from your end so this is really appreciated.
I will transfer the package over as soon as I can. I'm don't have admin access currently to my repo so I can't see the settings option. I need to negotiate admin rights, so hopefully that won't take too long.
Thanks again
Hi @noamross - I've completed your checklist. Thanks for your help. I'll start thinking about blogs :)
Great! Welcome aboard! I will close this issue, you'll find instructions on submitting a blog post over at https://github.com/ropensci/roweb2#contributing-a-blog-post .
Hello, just wondering how to get the rOpenSci badge status updated from "Under review" to "peer reviewed" for fingertipsR
on the README file?
We see it as "Peer Reviewed". You may need to clear your browser cache.
Thanks - that did the trick :)
Summary
What does this package do? (explain in 50 words or less): The package provides an interface to the API for the Fingertips website, a site developed by Public Health England. It enables users to import data on over 2,000 indicators of public health in England at different levels of geography.
Paste the full DESCRIPTION file inside a code block below:
https://github.com/PublicHealthEngland/fingertipsR
Data retrieval because it provides users with functions to import data for over 2,000 indicators of public health in England from the Fingertips website.
The target audience is anyone with an interest in public health. This includes researchers, public and private sector workers and the general public. The website gets almost one million hits in a year. The majority of users are in the public sector who need to make decisions that affect the public's health. This website helps inform those decisions. Increasingly, the website is being accessed by academic institutions. The fingertipsR package is already on CRAN and it is being discussed in some institutions as a future resource for teaching materials.
No
If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.
NA
Requirements
Confirm each of the following by checking the box. This package:
Publication options
paper.md
matching JOSS's requirements with a high-level description in the package root or ininst/
.Detail
[x] Does
R CMD check
(ordevtools::check()
) succeed? Paste and describe any errors or warnings:[x] Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:
If this is a resubmission following rejection, please explain the change in circumstances:
If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names: