Closed Robinlovelace closed 5 years ago
thanks for your submission @Robinlovelace We are discussing now and will get back to you soon
Great. It's in active development as you'll see. We have code to make all those millions of crash points sf data frames, we've just not pushed it. @layik has done great work refactoring the download functions so it makes it much easier to access this important dataset. Looking forward to hearing back from you and thanks for the update.
Heads-up @sckott format_sf()
is now in there and the package now has a proper vignette. Are there any other things we should add at this stage for peer review? Thanks.
will dig into it today, i'll let you know
Thanks for your submission @Robinlovelace !
Here's the output from goodpractice. No need to address these now, it's info for reviewers to use to get started.
── GP stats19 ─────────────
It is good practice to
✖ write unit tests for all functions, and all package code in general. 95% of code lines are covered by test cases.
R/dl_stats19.R:82:NA
R/dl_stats19.R:83:NA
R/format.R:55:NA
R/format.R:106:NA
R/format.R:157:NA
... and 12 more lines
✖ add a "URL" field to DESCRIPTION. It helps users find information about your package online. If your package does not have a homepage, add an URL
to GitHub, or the CRAN package package page.
✖ add a "BugReports" field to DESCRIPTION, and point it to a bug tracker. Many online code hosting services provide bug trackers for free,
https://github.com, https://gitlab.com, etc.
✖ use '<-' for assignment instead of '='. '<-' is the standard, and R users and developers are used it and it is easier to read your code for them if
you use '<-'.
R/dl_stats19.R:28:12
R/dl_stats19.R:30:9
R/dl_stats19.R:31:16
R/dl_stats19.R:32:11
R/dl_stats19.R:34:18
... and 204 more lines
✖ avoid long code lines, it is bad for readability. Also, many people prefer editor windows that are about 80 characters wide. Try make your lines
shorter than 80 characters
R/dl_stats19.R:28:1
R/dl_stats19.R:104:1
R/dl_stats19.R:105:1
R/format.R:42:1
R/format.R:98:1
... and 14 more lines
✖ avoid sapply(), it is not type safe. It might return a vector, or a list, depending on the input data. Consider using vapply() instead.
R/format.R:257:20
✖ avoid 1:length(...), 1:nrow(...), 1:ncol(...), 1:NROW(...) and 1:NCOL(...) expressions. They are error prone and result 1:0 if the expression on
the right hand side is zero. Use seq_len() or seq_along() instead.
R/format.R:247:11
✖ fix this R CMD check NOTE: Namespaces in Imports field not imported from: ‘dplyr’ ‘foreign’ ‘lubridate’ All declared Imports should be used.
Seeking reviewers now 🕐
Reviewers:
Thanks @sckott. I'd seen goodpractice but never used it - that is very useful. After a few commits stats19 is improved. Latest results:
── GP stats19 ───────────────────────────────────────────────────────────────────────────
It is good practice to
✖ write unit tests for all functions, and all package code in
general. 95% of code lines are covered by test cases.
R/dl_stats19.R:85:NA
R/dl_stats19.R:86:NA
R/format.R:43:NA
R/format.R:88:NA
R/format.R:133:NA
... and 12 more lines
✖ use '<-' for assignment instead of '='. '<-' is the standard, and
R users and developers are used it and it is easier to read your code for
them if you use '<-'.
R/dl_stats19.R:28:12
R/dl_stats19.R:33:9
R/dl_stats19.R:34:16
R/dl_stats19.R:35:11
R/dl_stats19.R:37:18
... and 203 more lines
Reviewers assigned:
Thanks to rOpenSci for the invitation to review, and special thanks to Robin, Malcolm, Layik, and Mark for this important package. I'm already a huge fan.
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
I would tease a bit more of what's in these data sets. I wasn't entirely sure until I downloaded and opened the supporting documentation. If I were searching for this kind of data, and I didn't know what STATS19 was, I'd like to know I'm in the right place after scanning the README. Maybe a map?
I couldn't load the vignette from the console:
vignette(package = "stats19")
#> no vignettes found
Admittedly, I also couldn't load the vignette for my own rOpenSci package, so I'm not sure what's missing.
Several of the examples failed:
read_casualties
-- argument for dl_stats19
should be year
instead of years
read_vehicles
-- I think year must be specified and presupposes an existing download of Vehicles data, e.g. ve = read_vehicles(year = 2017)
stats19::read_vehicles()
#> Error in if (year %in% 1980:2003) {: argument is of length zero
check_input_file
-- no example included
format_accidents
-- same as above, argument for dl_stats19
should be year
instead of years
format_casualties
-- same as above, argument for dl_stats19
should be year
instead of years
format_vehicles
-- same as above, argument for dl_stats19
should be year
instead of years
format_sf
-- read_accidents
requires a year argument
[x] Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).
I couldn't find any explicit contributing guidelines in the README, and there is no CONTRIBUTING document.
The package contains a paper.md
matching JOSS's requirements with:
There is no paper.md.
Test results:
══ Results ═════════════════════════════════════════════════════════════════════════════════════════════════ Duration: 24.2 s
OK: 37 Failed: 0 Warnings: 0 Skipped: 0
- [X] **Packaging guidelines**: The package conforms to the rOpenSci packaging guidelines
#### Final approval (post-review)
- [x] **The author has responded to my review and made changes to my satisfaction. I recommend approving this package.**
Estimated hours spent reviewing: 3.5
---
### Review Comments
> A superb and essential package--we need this data and we need it in these formats. The download-format-read-explore workflow is intuitive and relatively frictionless. I have only some brief comments:
* I wonder you could possibly merge the formatting and reading step with a `raw = TRUE` or `format = TRUE` argument in the `read_*` functions. But perhaps that's my tendency towards abstraction. Something like `ac = read_accidents(year = 2017, format = TRUE)`
* My personal preference would be to have the schema from `dl_schema` lazily loaded with the package.
* [According to the vignette,](https://github.com/ITSLeeds/stats19/blob/master/vignettes/stats19.Rmd) the `dl_*` functions are interactive, although the interactivity is commented out in the code. Will the interactivity be returning? Or does the vignette need to be updated?
* Out of curiosity, what's happening with https://github.com/cyipt/stats19? It was updated recently.
* I confess I wish the package name was more expressive--stats19 sounds like an introductory statistics class.
* This data will be used to make many maps. I personally would love a nudge in that direction in either the README or the vignette.
thanks very much for your review @daranzolin
Yeah, thanks @daranzolin for an impressively prompt review. I'll leave the response proper to @layik and @Robinlovelace, but will confess guilty here to having commented out the interactivity in the dl_*
functions. That's all my doing and I should already have reinstated it. Apologies to all, and yes, it will be returning.
Many thanks for the quick yet detailed review @daranzolin. I'm working on it this week and will get back to you on each of those constructive comments asap. More soon...
Great review @daranzolin. Appreciate it. Thanks @Robinlovelace and @mpadge. Apologies I just missed this thread for some reason.
Heads-up all (especially @daranzolin, @mpadge and @layik - the originator of interactive downloads in stats19), I've re-added interactivity, this time using menu()
which is best practice for this kind of thing and fewer lines of code.
Please check out the video to see how it works here:
And if you're interested in the code used to do this, see here: https://github.com/ITSLeeds/stats19/pull/36
Feedback on behaviour or code welcome before I merge this. Many thanks @layik for implementing this 1st time around.
Just going through the review @daranzolin, RE vignette, I think what you needed to do was
devtools::install(build_vignettes = TRUE)
then you would be able to load the vignette.
Heads-up we've addressed most of the issues raised by @daranzolin I think. Many thanks for a great set of comments. We're documenting them here https://github.com/ITSLeeds/stats19/issues/49 and plan to put the review responses here https://github.com/ITSLeeds/stats19/blob/master/responses1.Rmd
Not quite finished but plan to properly respond in a comment here by Saturday this week - sorry it's taken a while.
Hi @daranzolin, as promised by @Robinlovelace find our responses and thank you very much for taking time to do the review and please feel free to let us know if you have any further feedback.
=====
Thanks for the review. We've had a chance, after making some changes and fixes to the package, to take-in and act on each of the comments. The code-base has evolved substantially since the review, but the fundamental design of the package, with its 3 stage API mirroring workflows that happened before the package was developed, remains unchanged. That is:
dl_stats19()
downloads files from the DfT. Good news: we have spoken to the relevant people at the Department for Transport and they assured us that the endpoints are stable. The function now uses menu()
to provide a menu of download options for any year/type combinations and now finds files outside those explicitly mentioned in the file names. E.g.:dl_stats19(year = 2017)
# Multiple matches. Which do you want to download?
#
# 1: dftRoadSafetyData_Vehicles_2017.zip
# 2: dftRoadSafetyData_Casualties_2017.zip
# 3: dftRoadSafetyData_Accidents_2017.zip
dl_stats19(year = 2017, type = "ac")
# Files identified: dftRoadSafetyData_Accidents_2017.zip
#
# Wanna do it (y = enter, n = esc)?
dl_stats19(year = 1985)
# Year not in range, changing to match 1979:2004 data
# This file is over 240 MB in size.
# Once unzipped it is over 1.8 GB.
# Files identified: Stats19-Data1979-2004.zip
#
# Wanna do it (y = enter, n = esc)?
read_*()
these functions remain unchanged, except the order of arguments has changed. Like dl_stats19()
, year
is now the first argument, which is more intuitive.
format_*()
functions have been refactored. Each now uses format_stats19()
behind the scenes reducing duplication. The results are now better: more variables are now labelled.
Note: the last two stages is now combined by default as per this review read_*(format = TRUE)
by default.
We'll focus on areas flagged in the review for the rest of this response:
I would tease a bit more of what's in these data sets. I wasn't entirely sure until I downloaded and opened the supporting documentation. If I were searching for this kind of data, and I didn't know what STATS19 was, I'd like to know I'm in the right place after scanning the README. Maybe a map?
We have added a map (well technically 9 maps!) and a couple of time series plots showing the scale of the data. Also show a sample of the additional casualty and vehicle tables has been added to show more clearly the richness of data provided.
I couldn't load the vignette from the console:
We also could not see the vignette when installing using devtools::install_github(build_vignettes = TRUE
. But we can see the vignette if we install locally.
This was the code we ran:
devtools::install(build_vignettes = TRUE)
## Installing stats19
vignette(package = "stats19")
Several of the examples failed:
These have now been fixed - thanks for testing and reporting.
I couldn't find any explicit contributing guidelines in the README, and there is no CONTRIBUTING document.
A CONTRIBUTING is added now. Thank you.
The package has an obvious research application according to JOSS's definition
There is no paper.md.
One is added with:
Review Comments
A superb and essential package--we need this data and we need it in these formats. The download-format-read-explore workflow is intuitive and relatively frictionless. I have only some brief comments:
Thank you.
I wonder you could possibly merge the formatting and reading step with a raw = TRUE or format = TRUE argument in the read_* functions. But perhaps that's my tendency towards abstraction. Something like ac = read_accidents(year = 2017, format = TRUE)
Done, appreciate your input.
My personal preference would be to have the schema from dl_schema lazily loaded with the package.
DESCRIPTION: has the line LazyData which means stats19_schema is lazy loaded.
According to the vignette, the dl_* functions are interactive, although the interactivity is commented out in the code. Will the interactivity be returning? Or does the vignette need to be updated?
Back in, as stated above.
Out of curiosity, what's happening with https://github.com/cyipt/stats19? It was updated recently.
@mem48 answered this: cyipt/stats19 is not actually a proper R package. It is a repo containing scripts for CyIPT project, it has different sources (UK DS), and usage so there is no current need to adapt the use to this package. Malcolm is one of the contributors to this package.
I confess I wish the package name was more expressive--stats19 sounds like an introductory statistics class.
This a reasonable point that we have thought of and discussed. We are open minded about changing the name but, as with so many things, there are +s and -s (outlined for some options below):
stats19data
roadcrashesUK
roadSafetyData
The main benefit we can see of changing the name would be making the package easier to find. We think good documentation and clear description and some write-ups of the package and what it does could address these issues. We've explored stat19 name and it links directly to (and is almost synonymous with) road crash data. See https://en.wikipedia.org/wiki/STATS19 for an excellent example (we plan to add this link to the README)
so the name is OK for we think, but we're open minded to alternative names mentioned above and perhaps names we've not thought of.
This data will be used to make many maps. I personally would love a nudge in that direction in either the README or the vignette.
Definitely. Thank you very much for your input.
Quick question for @sckott, what is you policy regarding CRAN submission while it's under review? We may change this in response the the answer (or we submit tomorrow!): https://github.com/ITSLeeds/stats19/milestone/1.
Look forward to your response @adamhsparks.
Thanks @Robinlovelace @layik @mpadge -- the new vignette and README looks great. @sckott, all of my comments have been addressed.
One last observation: I'm getting parsing failures on the Accident_Index
column with both read_vehicles
and read_casualties
:
> vehicles = read_vehicles()
Warning: 298687 parsing failures.
row col expected actual file
1 Acc_Index no trailing characters BS70001 '/var/folders/j4/lb613tmj3_xd5jj5n11dprz04kdcr4/T//RtmpWy104K/DfTRoadSafety_Vehicles_2009/DfTRoadSafety_Vehicles_2009.csv'
2 Acc_Index no trailing characters BS70001 '/var/folders/j4/lb613tmj3_xd5jj5n11dprz04kdcr4/T//RtmpWy104K/DfTRoadSafety_Vehicles_2009/DfTRoadSafety_Vehicles_2009.csv'
3 Acc_Index no trailing characters BS70002 '/var/folders/j4/lb613tmj3_xd5jj5n11dprz04kdcr4/T//RtmpWy104K/DfTRoadSafety_Vehicles_2009/DfTRoadSafety_Vehicles_2009.csv'
4 Acc_Index no trailing characters BS70002 '/var/folders/j4/lb613tmj3_xd5jj5n11dprz04kdcr4/T//RtmpWy104K/DfTRoadSafety_Vehicles_2009/DfTRoadSafety_Vehicles_2009.csv'
5 Acc_Index no trailing characters BS70003 '/var/folders/j4/lb613tmj3_xd5jj5n11dprz04kdcr4/T//RtmpWy104K/DfTRoadSafety_Vehicles_2009/ [... truncated]
Warning message:
The following named parsers don't match the column names: Accident_Index
> casualties = read_casualties()
Warning: 222146 parsing failures.
row col expected actual file
1 Acc_Index no trailing characters BS70001 '/var/folders/j4/lb613tmj3_xd5jj5n11dprz04kdcr4/T//RtmpWy104K/DfTRoadSafety_Casualties_2009/DfTRoadSafety_Casualties_2009.csv'
2 Acc_Index no trailing characters BS70002 '/var/folders/j4/lb613tmj3_xd5jj5n11dprz04kdcr4/T//RtmpWy104K/DfTRoadSafety_Casualties_2009/DfTRoadSafety_Casualties_2009.csv'
3 Acc_Index no trailing characters BS70002 '/var/folders/j4/lb613tmj3_xd5jj5n11dprz04kdcr4/T//RtmpWy104K/DfTRoadSafety_Casualties_2009/DfTRoadSafety_Casualties_2009.csv'
4 Acc_Index no trailing characters BS70002 '/var/folders/j4/lb613tmj3_xd5jj5n11dprz04kdcr4/T//RtmpWy104K/DfTRoadSafety_Casualties_2009/DfTRoadSafety_Casualties_2009.csv'
5 Acc_Index no trailing characters BS70002 '/var/folders/j4/lb613tmj3_xd5jj5n11dprz04kdcr4/T//RtmpWy104K/DfTRoadS [... truncated]
Warning message:
The following named parsers don't match the column names: Accident_Index
Is it because the actual column name is acc_index
instead of Accident_Index
?
Thanks @daranzolin, which years do you get parsing failures for? Cannot reproduce for 2017 data, as illustrated in the reprex below:
devtools::install_github("ITSLeeds/stats19")
#> Skipping install of 'stats19' from a github remote, the SHA1 (225bce03) has not changed since last install.
#> Use `force = TRUE` to force installation
library(stats19)
#> Data provided under the conditions of the Open Government License.
#> If you use data from this package, mention the source
#> (Department for Transport), cite the package and link to:
#> www.nationalarchives.gov.uk/doc/open-government-licence/version/3/.
# casualties
dl_stats19(year = 2017, type = "casualties")
#> Files identified: dftRoadSafetyData_Casualties_2017.zip
#> Attempt downloading from:
#> http://data.dft.gov.uk.s3.amazonaws.com/road-accidents-safety-data/dftRoadSafetyData_Casualties_2017.zip
#> Data saved at /tmp/Rtmpw3fpFE/dftRoadSafetyData_Casualties_2017/Cas.csv
ca = read_casualties(year = 2017)
head(ca$accident_index)
#> [1] "2017010001708" "2017010001708" "2017010001708" "2017010009342"
#> [5] "2017010009344" "2017010009348"
# vehicles
dl_stats19(year = 2017, type = "vehicles")
#> Files identified: dftRoadSafetyData_Vehicles_2017.zip
#> Attempt downloading from:
#> http://data.dft.gov.uk.s3.amazonaws.com/road-accidents-safety-data/dftRoadSafetyData_Vehicles_2017.zip
#> Data saved at /tmp/Rtmpw3fpFE/dftRoadSafetyData_Vehicles_2017/Veh.csv
ve = read_vehicles(year = 2017)
head(ve$accident_index)
#> [1] "2017010001708" "2017010001708" "2017010009342" "2017010009342"
#> [5] "2017010009344" "2017010009344"
Created on 2018-12-21 by the reprex package (v0.2.1)
It was the 2009 datasets for vehicles and casualties. Apologies for the missing reprex.
devtools::install_github("ITSLeeds/stats19")
#> Skipping install of 'stats19' from a github remote, the SHA1 (53a82fe9) has not changed since last install.
#> Use `force = TRUE` to force installation
#> Skipping install of 'stats19' from a github remote, the SHA1 (225bce03) has not changed since last install.
#> Use `force = TRUE` to force installation
library(stats19)
#> Data provided under the conditions of the Open Government License.
#> If you use data from this package, mention the source
#> (UK Department for Transport), cite the package, and link to:
#> www.nationalarchives.gov.uk/doc/open-government-licence/version/3/.
#> Data provided under the conditions of the Open Government License.
#> If you use data from this package, mention the source
#> (Department for Transport), cite the package and link to:
#> www.nationalarchives.gov.uk/doc/open-government-licence/version/3/.
# casualties
dl_stats19(year = 2009, type = "casualties")
#> Files identified: DfTRoadSafety_Casualties_2009.zip
#> Attempt downloading from:
#> http://data.dft.gov.uk.s3.amazonaws.com/road-accidents-safety-data/DfTRoadSafety_Casualties_2009.zip
#> Data saved at /var/folders/1f/2ds51cyj353cmqcy419df_6m0000gn/T//Rtmp2IlBw0/DfTRoadSafety_Casualties_2009/DfTRoadSafety_Casualties_2009.csv
#> Files identified: dftRoadSafetyData_Casualties_2017.zip
#> Attempt downloading from:
#> http://data.dft.gov.uk.s3.amazonaws.com/road-accidents-safety-data/dftRoadSafetyData_Casualties_2017.zip
#> Data saved at /tmp/Rtmpw3fpFE/dftRoadSafetyData_Casualties_2017/Cas.csv
ca = read_casualties(year = 2009)
#> Warning: The following named parsers don't match the column names:
#> Accident_Index
#> Warning: 222146 parsing failures.
#> row col expected actual file
#> 1 Acc_Index no trailing characters BS70001 '/var/folders/1f/2ds51cyj353cmqcy419df_6m0000gn/T//Rtmp2IlBw0/DfTRoadSafety_Casualties_2009/DfTRoadSafety_Casualties_2009.csv'
#> 2 Acc_Index no trailing characters BS70002 '/var/folders/1f/2ds51cyj353cmqcy419df_6m0000gn/T//Rtmp2IlBw0/DfTRoadSafety_Casualties_2009/DfTRoadSafety_Casualties_2009.csv'
#> 3 Acc_Index no trailing characters BS70002 '/var/folders/1f/2ds51cyj353cmqcy419df_6m0000gn/T//Rtmp2IlBw0/DfTRoadSafety_Casualties_2009/DfTRoadSafety_Casualties_2009.csv'
#> 4 Acc_Index no trailing characters BS70002 '/var/folders/1f/2ds51cyj353cmqcy419df_6m0000gn/T//Rtmp2IlBw0/DfTRoadSafety_Casualties_2009/DfTRoadSafety_Casualties_2009.csv'
#> 5 Acc_Index no trailing characters BS70002 '/var/folders/1f/2ds51cyj353cmqcy419df_6m0000gn/T//Rtmp2IlBw0/DfTRoadSafety_Casualties_2009/DfTRoadSafety_Casualties_2009.csv'
#> ... ......... ...................... ....... ..............................................................................................................................
#> See problems(...) for more details.
head(ca$accident_index)
#> Warning: Unknown or uninitialised column: 'accident_index'.
#> NULL
# vehicles
dl_stats19(year = 2009, type = "vehicles")
#> Files identified: DfTRoadSafety_Vehicles_2009.zip
#> Attempt downloading from:
#> http://data.dft.gov.uk.s3.amazonaws.com/road-accidents-safety-data/DfTRoadSafety_Vehicles_2009.zip
#> Data saved at /var/folders/1f/2ds51cyj353cmqcy419df_6m0000gn/T//Rtmp2IlBw0/DfTRoadSafety_Vehicles_2009/DfTRoadSafety_Vehicles_2009.csv
#> Files identified: dftRoadSafetyData_Vehicles_2017.zip
#> Attempt downloading from:
#> http://data.dft.gov.uk.s3.amazonaws.com/road-accidents-safety-data/dftRoadSafetyData_Vehicles_2017.zip
#> Data saved at /tmp/Rtmpw3fpFE/dftRoadSafetyData_Vehicles_2017/Veh.csv
ve = read_vehicles(year = 2009)
#> Warning: The following named parsers don't match the column names:
#> Accident_Index
#> Warning: 298687 parsing failures.
#> row col expected actual file
#> 1 Acc_Index no trailing characters BS70001 '/var/folders/1f/2ds51cyj353cmqcy419df_6m0000gn/T//Rtmp2IlBw0/DfTRoadSafety_Vehicles_2009/DfTRoadSafety_Vehicles_2009.csv'
#> 2 Acc_Index no trailing characters BS70001 '/var/folders/1f/2ds51cyj353cmqcy419df_6m0000gn/T//Rtmp2IlBw0/DfTRoadSafety_Vehicles_2009/DfTRoadSafety_Vehicles_2009.csv'
#> 3 Acc_Index no trailing characters BS70002 '/var/folders/1f/2ds51cyj353cmqcy419df_6m0000gn/T//Rtmp2IlBw0/DfTRoadSafety_Vehicles_2009/DfTRoadSafety_Vehicles_2009.csv'
#> 4 Acc_Index no trailing characters BS70002 '/var/folders/1f/2ds51cyj353cmqcy419df_6m0000gn/T//Rtmp2IlBw0/DfTRoadSafety_Vehicles_2009/DfTRoadSafety_Vehicles_2009.csv'
#> 5 Acc_Index no trailing characters BS70003 '/var/folders/1f/2ds51cyj353cmqcy419df_6m0000gn/T//Rtmp2IlBw0/DfTRoadSafety_Vehicles_2009/DfTRoadSafety_Vehicles_2009.csv'
#> ... ......... ...................... ....... ..........................................................................................................................
#> See problems(...) for more details.
head(ve$accident_index)
#> Warning: Unknown or uninitialised column: 'accident_index'.
#> NULL
Created on 2018-12-21 by the reprex package (v0.2.1)
Hi David. I will have a look at 2009 and see what the issue is and report.
Hi @daranzolin thanks loads for reporting. I think it's fixed on this branch which I'm planning to merge, as demonstrated in this reprex:
devtools::install_github("ITSLeeds/stats19", ref = "index")
library(stats19)
#> Data provided under the conditions of the Open Government License.
#> If you use data from this package, mention the source
#> (UK Department for Transport), cite the package, and link to:
#> www.nationalarchives.gov.uk/doc/open-government-licence/version/3/.
dl_stats19(year = 2009, type = "casualties")
#> Files identified: DfTRoadSafety_Casualties_2009.zip
#> Attempt downloading from:
#> http://data.dft.gov.uk.s3.amazonaws.com/road-accidents-safety-data/DfTRoadSafety_Casualties_2009.zip
#> Data saved at /tmp/RtmpFsdiJv/DfTRoadSafety_Casualties_2009/DfTRoadSafety_Casualties_2009.csv
ca = read_casualties(year = 2009)
head(ca$accident_index)
#> [1] "200901BS70001" "200901BS70002" "200901BS70002" "200901BS70002"
#> [5] "200901BS70002" "200901BS70002"
dl_stats19(year = 2009, type = "vehicles")
#> Files identified: DfTRoadSafety_Vehicles_2009.zip
#> Attempt downloading from:
#> http://data.dft.gov.uk.s3.amazonaws.com/road-accidents-safety-data/DfTRoadSafety_Vehicles_2009.zip
#> Data saved at /tmp/RtmpFsdiJv/DfTRoadSafety_Vehicles_2009/DfTRoadSafety_Vehicles_2009.csv
ve = read_vehicles(year = 2009)
head(ve$accident_index)
#> [1] "200901BS70001" "200901BS70001" "200901BS70002" "200901BS70002"
#> [5] "200901BS70003" "200901BS70003"
Created on 2018-12-23 by the reprex package (v0.2.1)
Update: index
branch has now been merged.
@adamhsparks your review is due in 1 week, thanks very much
@Robinlovelace we usually don't have maintainers submit to CRAN during the review but if its already on CRAN and there is a bug fix patch, then seems justifiable to submit to CRAN during review
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be auto-generated via Authors@R
).For packages co-submitting to JOSS
- [x] The package has an obvious research application according to JOSS's definition
The package contains a
paper.md
matching JOSS's requirements with:
- [x] A short summary describing the high-level functionality of the software
- [x] Authors: A list of authors with their affiliations
- [x] A statement of need clearly stating problems the software is designed to solve and its target audience.
- [x] References: with DOIs for all those that have one (e.g. papers, datasets, software).
Estimated hours spent reviewing: 7.5
Clearly much effort has gone into this package. I greatly support the sentiment behind making these data available in R having done the same for other data sets, myself. This package should be a great benefit to researchers in this area. I really appreciate the slim dependencies. It will make this package much easier to maintain into the future.
I found the package to be well documented, the vignette is helpful in illustrating some use cases for the data along with how to access it and the code is clear.
Some of the functionality I find to be mildly confusing like downloading and then importing the files into the R session and then formatting. As a user I'd prefer it all in one step, but there are likely use cases I'm not aware of that mean that this is useful so some examples of this would be useful I think.
My general comments on the code follow and sections for the DESCRTIPTION and Vignette as well. I've commented quite a bit on grammar and spelling as I think that the polish of a package is important as it lends to the perception of the quality.
Thanks for inviting me to review this. It's been fun and I've learned some new things.
Per rOpenSci policy, avoid start-up messages, rely on the documentation for citation information: https://ropensci.github.io/dev_guide/building.html#recommended-scaffolding.
Avoid long lines >80 chars
Inconsistent use of white spaces in code, see find_file_name()
if
statements for examples.
The package does not pass R CMD check
. curl, readxl and tibble are all listed as an Imports in DESCRIPTION but not imported from. With curl being used in tests, this means it should be in Suggests, I think. The others should be removed.
I don't think it's good form to have an example that won't work on Windows in the help file for stats19_schema
, from data.R - line 17? Most of what I see there would be better served in a data_raw
folder showing how the data were created with the documentation actually documenting what the variables are not how they were created, see http://r-pkgs.had.co.nz/data.html and for an example, https://github.com/ropensci/GSODR/tree/master/data-raw.
I would suggest to use proper formatting in help files, when naming packages, e.g. \pkg{stats19} and when referring to documented functions or data, e.g. \code{\link{stats19_schema}}, or with single quotes around abbreviations, e.g. 'DfT'. @ColinFay has an excellent page that outlines the formatting options and when/how to use them, https://colinfay.me/writing-r-extensions/writing-r-documentation-files.html. This will greatly enhance the users' experience when using the help files by making them more readable.
I also would suggest making use of @seealso
in documentation. For example, the dl_stats19()
example works great in the help files, but from there I have the data but it's not in R. Using the @seealso
you can let the user know about the read_*()
functions.
I downloaded files using dl_stats19()
, selecting "Casualties", and then ran read_accidents()
and got
Error in check_input_file(filename = filename, type = "accidents", data_dir = data_dir, :
Change data_dir, filename, year or run dl_stats19() first.
Is it possible to be more descriptive and say that I've used the wrong read_*()
based on the file/data found and offer to import it?
Missing "." after "e.g." in dl.R on lines 8 and 9, there may be others that I didn't spy.
Capitalisation in help files is inconsistent, e.g. lines 123-125 of read.R, parameter descriptions are mixed upper and lower case for first word after parameter itself. This applies to other functions where the descriptions are given in all lower case for other functions or upper case.
Testing the functionality, I get this, when I expect it to tell me that deaths
is not a valid input. But then when I hit escape, I expect it simply exit, not provide a warning message on the way out as well.
dl_stats19(year = 1979, type = "deaths")
No files of that type found for that year.
This will download 240 MB+ (1.8 GB unzipped).
Files identified: Stats19-Data1979-2004.zip
Download now (y = enter, n = esc)?
Warning message: In find_file_name(years = year, type = type) : Coordinates unreliable in this data.
- I got caught out when using the interactive features. I read "y = enter" but hit "y" thinking that would work as well as hitting "enter", but R cancelled the operation anyway just as if I'd hit "esc"
- Per a recent conversation with CRAN, you should use `donttest()` rather than `dontrun()` for examples you don't want to be run on CRAN. Then set .travis.yml to run them by using `r_check_args: --as-cran --run-donttest`. **This may not be appropriate in all cases, e.g. interactive functions.**
- When validating user inputs and using `stop()` it's nice to use `call. = FALSE` to simplify the error message that the user receives.
- Consider using [`hoardr`](https://ropensci.github.io/hoardr/) for managing user-saved files on disk that aren't in `tempdir()`?
- When using `utils::download.file()`, you should use `mode = "wb"` or Windows users may end up with corrupted downloads in my experience. `curl::curl_download()` does the same thing but uses more updated ways of doing it and defaults to using a binary mode (wb).
- I don't think that there is much need for the `Attempting download from` or `Reading in: ` message. If it takes that long, I would suggest to use a progress bar to show progress. But this is just a personal observation.
- Consider setting up a `pkgdown` site? It's easy to do and you can automate deployment with your Travis-CI so it's less to remember.
#### Tests
- I'm unclear how the interactive portion of the package functions is handled in testing? There are ways to handle this, but I don't see any implemented and when I run `devtools::test()` I'm asked to provide my own input.
- Suggest using `skip_on_cran()` since some of the tests can take some time to execute due to download times.
#### DESCRIPTION File
- In the DESCRIPTION file, Mark's author entry is missing his ORCID.
- More information in the DESCRIPTION's Description field would be desirable, a link to the data's website or other information to give more background perhaps.
- STATS19 should be in "'" in DESCRIPTION for CRAN, i.e., 'STATS19', I think.
- Check spelling in DESCRIPTION file, see: "analysie".
- The Description should include a link to the DfT website.
- Language field should be set, `Language: en-GB`.
#### README File(s)
- Use `remotes::install_github()` in place of `devtools::install_github()` in README.
- The code style is inconsistent in the README.Rmd file in the code chunks, e.g. line 85 is missing space around `=`.
- The example in the README showing two steps seems necessarily confusing to new users. If there is a good reason for having the raw data in R, document in a vignette why this is useful and show the two-step process, but if normal users won't do this, I wouldn't show it in the quick-start.
- Line 43 of README uses inconsistent "(" around the phrases with the other two `read_*` function description.
#### Vignette
- Run spell-check on it.
- The term "attach"" has a specific meaning in R. Suggest rewording the portion about installation and loading the package to omit the use of "attach", since you're not using `attach()` in the R sense (and really shouldn't use it anyway).
- I would describe why a user might want or need to install the Development version from GitHub in the vignette. Presumably if they are reading the vignette, they've already installed the package from CRAN (in the future).
- Try to consistently use `function()` to identify functions in the vignette text. This also means that if/when you use pkgdown to build a site, the functions are linked to the help file.
- In the introduction, the description of why there are `read_*()` and `format_*()` functions is confusing. To me, it reads as if `format` is only a parameter for `read_*()` in the introduction. I was left wondering why it's documented there or why the `format_*()`s even exist until I reached the end of the vignette.
- There is a comma out of place in Vignette,
>Format: Each of the read_*() functions has a format parameter which, when TRUE, adds
should be
>Format: Each of the read_*() functions has a format parameter, which, when TRUE, adds
- I'm unsure about including a package that's not on CRAN in the vignette (`ukboundaries`), something like this should be listed in Suggests, but it's not on CRAN, @sckott do you have any thoughts?
- The first figures in the `sf` section after the join aren't immediately clear to me. The axis lack labels, I'm not really sure what I'm looking at.
#### Meta
- The contributing guidelines mention a `pkgdown` website, this does not exist.
- The JOSS paper needs spelling check run on it to correct spelling errors.
This review is also hosted as a repository on my GitHub, https://github.com/adamhsparks/stats19-review
Also, as I've been working on this, I've modified the DESCRIPTION file to fix some errors, and address most of the points I raised. I've also run spellcheck on the vignette if you'd accept a PR.
thanks for your review @adamhsparks !
That is a great review @adamhsparks. Thank you. Thanks @sckott appreciate your follow up!
Sorry, I missed the paper.md in the root dir. I've updated my review to reflect this.
Hi @adamhsparks no problem, and just saw the PR. Will look now. Meanwhile, I've put our work-in-progress response to your review here: https://github.com/ITSLeeds/stats19/issues/63 - any questions, please ask. Very grateful for the level of detail and knowledge in the review. Thank you!
One question, probably more for @sckott: does JOSS accept figures and code chunks? Thinking that could liven it up a bit...
OK just merged those 2 mini PRs. That raises another question: are both reviewers happy to be in the author list with the new rev
tag? More info here: https://ropensci.org/blog/2018/03/16/thanking-reviewers-in-metadata/
Hi @Robinlovelace, happy to be listed as a reviewer, "rev", and contributor, "ctb", for the pull requests you've accepted. My ORCID: 0000-0002-0061-8359.
@Robinlovelace rev
is great for me! Thanks for asking.
Great, I've add both reviewers in https://github.com/ITSLeeds/stats19/pull/67/files
@Robinlovelace will ask, not sure about the JOSS question
AFAICT figures can be included https://joss.readthedocs.io/en/latest/submitting.html#what-should-my-paper-contain - it doesn't say anything about code chunks, but you could try
Happy to say, we've finally responded to all the points made by @adamhsparks. We've learned a lot in the process, and have added an auto deploy website (see https://itsleeds.github.io/stats19/ ), plus various current best practice things like donttest
that we didn't know about. Thanks loads!
Any further comments/suggestions: very welcome.
Overall, we found all the suggestions sensible, linking to previous discussions about combining the 3 stage (dl
, read
, format
) process into a single function call, which we were thinking of calling get_stats19()
.
The reason for splitting the process up is to ensure maximum transparency and to give the user control over what the package is doing. However, as long as it is properly documented, we think the benefits of a get_stats19()
function will outweigh any possible negatives we can think of. get_stats19()
has now been added!
Thanks for the comments, we have indeed tried to keep dependencies to a minimum but consider readr
worthwhile. readxl
and tibble
have been removed. curl
has been demoted to Suggests
, as detailed in another comment.
Thanks. If you think of other was we can communicate the value of the data, do let us know (I think the second mapping figure could be improved...).
We have long been planning to add a get_stats19()
function as per https://github.com/ITSLeeds/stats19/issues/11
The review comment, combined with further discussion, has triggered us to re-prioritise it.
It's been beneficial to polish each of the component functions first, however, and good to document each stage for maximum transparency, however, so we plan to keep the dl
, read
and format
functions exported.
Agreed.
The guidance is to 'Only use package startup messages when necessary'.
A case can be made that this is necessary.
As with osmdata
, the package provides access to data that has a license that requires it to be cited.
The osmdata
load message is as follows:
library(osmdata)
We fully agree with the reasoning behind remove package startup messages however. As a compromise, we've shortened the startup from 4 lines to 2:
# before:
# Data provided under the conditions of the Open Government License.
# If you use data from this package, mention the source
# (UK Department for Transport), cite the package, and link to:
# www.nationalarchives.gov.uk/doc/open-government-licence/version/3/.
# after:
library(stats19)
Data provided under OGL v3.0. Cite the source and link to:
www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
running goodpractice::gp()
found the following lines with > 80 lines:
R/format.R:62:1
R/format.R:67:1
R/read.R:141:1
R/utils.R:167:1
All these have been fixed.
[x] Inconsistent use of white spaces in code, see find_file_name()
if
statements for examples.
[x] The package does not pass R CMD check
. curl, readxl and tibble are all listed as an Imports in DESCRIPTION but not imported from. With curl being used in tests, this means it should be in Suggests, I think. The others should be removed.
Kindly make sure you build
before running check the resulting
gz` archive.
curl
is used in the tests and readxl
is used in the examples.
These have been demoted to Suggests
.
tibble
has been removed from the DESCRIPTION file.
stats19_schema
, from data.R - line 17? Most of what I see there would be better served in a data_raw
folder showing how the data were created with the documentation actually documenting what the variables are not how they were created, see http://r-pkgs.had.co.nz/data.html and for an example, https://github.com/ropensci/GSODR/tree/master/data-raw.See https://github.com/ITSLeeds/stats19/pull/68
Acronyms have been explained and better formatting of 'STATS19' (note, it's capitalised when referring to the data, but lower case when referring to the package), and a few other improvements to the docs, have been made. See https://github.com/ITSLeeds/stats19/commit/e4ab09aa716630322aadd26dee3465258a2e58a0
@seealso
in documentation. For example, the dl_stats19()
example works great in the help files, but from there I have the data but it's not in R. Using the @seealso
you can let the user know about the read_*()
functions.@seealso
tags have now been added to a few key documentation pages.
dl_stats19()
, selecting "Casualties", and then ran read_accidents()
and got
Is it possible to be more descriptive and say that I've used the wrong read_*()
based on the file/data found and offer to import it?Yes, but I think the costs of doing this would outweigh the benefits (RL)
Fixed in https://github.com/ITSLeeds/stats19/commit/777479b9234be893a9946c787ef23cc08ff6a41a
Fixed in https://github.com/ITSLeeds/stats19/commit/aec13fc0673717d68a3eb159c066d76a8d0645ca
deaths
is not a valid input. But then when I hit escape, I expect it simply exit, not provide a warning message on the way out as well.dl_stats19(year = 1979, type = "deaths")
No files of that type found for that year.
This will download 240 MB+ (1.8 GB unzipped).
Files identified: Stats19-Data1979-2004.zip
Download now (y = enter, n = esc)?
Warning message:
In find_file_name(years = year, type = type) :
Coordinates unreliable in this data.
Now produces the following output: Error in dl_stats19(year = 1979, type = "deaths") : Stopping as requested
Agreed, that was an issue. Fixed in the latest version (tested with > stats19::dl_stats19(year = 1979)
)
donttest()
rather than dontrun()
for examples you don't want to be run on CRAN. Then set .travis.yml to run them by using r_check_args: --as-cran --run-donttest
. This may not be appropriate in all cases, e.g. interactive functions.Trying this, as per: https://github.com/ITSLeeds/stats19/pull/69
stop()
it's nice to use call. = FALSE
to simplify the error message that the user receives.Add as requested. See https://github.com/ITSLeeds/stats19/pull/70
hoardr
for managing user-saved files on disk that aren't in tempdir()
?We have considered this and have taken a good look at the package. In this case, however, I don't think it's appropriate: saving the 1/4 GB .zip file associated with the 1979:2005 data, for example, does not make sense when it can be extracted and saved in a more useful format. We assume the user will extract and save the data they use. Furthermore, the DfT is trialling an API which should make the desire to save intermediate files even rarer. Users are told where the files are saved, so are free to move them where they like.
utils::download.file()
, you should use mode = "wb"
or Windows users may end up with corrupted downloads in my experience. curl::curl_download()
does the same thing but uses more updated ways of doing it and defaults to using a binary mode (wb).We have tested on Windows and have not found any issues with the dl_stats19()
function - so treating this as a 'wontfix', unless a strong reason can be found for adding this.
Attempting download from
or Reading in:
message. If it takes that long, I would suggest to use a progress bar to show progress. But this is just a personal observation.There are read and download progress bars for large datasets. Try this, for example:
dl_stats19(year = 1979)
pkgdown
site? It's easy to do and you can automate deployment with your Travis-CI so it's less to remember.A site has been added: https://itsleeds.github.io/stats19/
I'm testing auto-deploy on travis and will happily merge this branch if it works (RL): https://github.com/ITSLeeds/stats19/pull/71
Update: auto deploy failed so reverting that change to .travis.yml. pkgdown that was making it fail. See https://travis-ci.org/ITSLeeds/stats19/builds/475415270#L1622 And: https://ropenscilabs.github.io/travis/
Is this is an ropensci pkg issue and is it worth asking for support?
devtools::test()
I'm asked to provide my own input.When R is running non-interactively it auto-downloads the necessary files or produces an error. All tests involving downloads work on our computers and Travis. They can be turned off for the duration of your session with:
Sys.setenv(DONT_DOWNLOAD_ANYTHING = "true")
Not sure if that answers the question - any further feedback on this: welcome.
skip_on_cran()
since some of the tests can take some time to execute due to download times.Added in https://github.com/ITSLeeds/stats19/pull/72
See fixes here: https://github.com/ITSLeeds/stats19/commit/fdeb634dbab5a39ab599b35693623d0b1c94f71d
[x] In the DESCRIPTION file, Mark's author entry is missing his ORCID.
[x] More information in the DESCRIPTION's Description field would be desirable, a link to the data's website or other information to give more background perhaps.
[x] STATS19 should be in "'" in DESCRIPTION for CRAN, i.e., 'STATS19', I think.
[x] Check spelling in DESCRIPTION file, see: "analysie"
[x] The Description should include a link to the DfT website.
[x] Language field should be set, Language: en-GB
[x] Use remotes::install_github()
in place of devtools::install_github()
in README.
[x] The code style is inconsistent in the README.Rmd file in the code chunks, e.g. line 85 is missing space around =
.
[x] The example in the README showing two steps seems necessarily confusing to new users. If there is a good reason for having the raw data in R, document in a vignette why this is useful and show the two-step process, but if normal users won't do this, I wouldn't show it in the quick-start.
[x] Line 43 of README uses inconsistent "(" around the phrases with the other two read_*
function description.
[x] Run spell-check on it.
[x] The term "attach"" has a specific meaning in R. Suggest rewording the portion about installation and loading the package to omit the use of "attach", since you're not using attach()
in the R sense (and really shouldn't use it anyway).
[x] I would describe why a user might want or need to install the Development version from GitHub in the vignette. Presumably if they are reading the vignette, they've already installed the package from CRAN (in the future).
[x] Try to consistently use function()
to identify functions in the vignette text. This also means that if/when you use pkgdown to build a site, the functions are linked to the help file.
[x] In the introduction, the description of why there are read_*()
and format_*()
functions is confusing. To me, it reads as if format
is only a parameter for read_*()
in the introduction. I was left wondering why it's documented there or why the format_*()
s even exist until I reached the end of the vignette.
[x] There is a comma out of place in Vignette,
[x] Format: Each of the read_*() functions has a format parameter which, when TRUE, adds
should be
[x] Format: Each of the read_*() functions has a format parameter, which, when TRUE, adds
[x] I'm unsure about including a package that's not on CRAN in the vignette (ukboundaries
), something like this should be listed in Suggests, but it's not on CRAN, @sckott do you have any thoughts?
This is a good point.
Fixed, by adding a much more useful dataset, representing the juristictions of police forces across England and Wales: police_boundaries
.
sf
section after the join aren't immediately clear to me. The axis lack labels, I'm not really sure what I'm looking at.Agreed. The maps were not particularly clear. An overview map has been added to the vignette. For context, and to ensure README-vignette harmony, we also changed the README's mapping section. It now also contains an overview map so people know what they are looking at, but not the facetted map, which was confusing.
pkgdown
website, this does not existI too, learned a lot from this review @sckott, @daranzolin and @adamhsparks. I really appreciate all your contributions. Fantastic open source vibes.
Thanks @Robinlovelace and @mpadge awesome work.
Thanks, @Robinlovelace, @layik and @mpadge. I'll have a look at the changes shortly and respond.
Thank you @Robinlovelace, @layik and @mpadge for your consideration of my comments and the incorporation of many of them as you see fit. However, there are still a few issues (most of which you did attempt to previously address but seem to have slipped through) that I think should be fixed before I sign off on it.
Use spelling::spell_check_package()
and correct the errors.
There's a line >80 chars. Not a hard-fast rule, but nice to observe and this one can be easily split over two lines. It is Line 149 of read.R.
In the data.R file there are several examples that say something like: "The file names were generated as follows:" IMO any example like this needs to be in the data-raw
in Rmd files and only the data in the final package with proper documentation as to what the data are and a link to where they were obtained from. A link to the data-raw
files would be appropriate in the documentation for each data set.
From Hadley's R packages - Data page: http://r-pkgs.had.co.nz/data.html
Often, the data you include in data/ is a cleaned up version of raw data you’ve gathered from elsewhere. I highly recommend taking the time to include the code used to do this in the source version of your package. This will make it easy for you to update or reproduce your version of the data. I suggest that you put this code in data-raw/. You don’t need it in the bundled version of your package,...
Emphasis mine.
There is also an error in the examples for police_boundaries
. I cannot reproduce the data set as you have in your example.
> library(sf)
Linking to GEOS 3.7.1, GDAL 2.3.2, PROJ 5.2.0
> u = paste0("https://opendata.arcgis.com/",
+ "datasets/3e5a096a8c7c456fb6d3164a3f44b005_3.kml"
+ )
> police_boundaries_wgs = read_sf(u)
Warning in CPL_read_ogr(dsn, layer, query, as.character(options), quiet, :
GDAL Error 1: JSON parsing error: unexpected character (at offset 0)
> police_boundaries = st_transform(police_boundaries_wgs, 27700)
> police_boundaries = police_boundaries[c("pfa16cd", "pfa16nm")]
Error: Can't find columns `pfa16cd`, `pfa16nm` in `.data`.
Execution halted
The warning with GDAL can be circumvented if I first download the file then import it? Not that it matters, the error still occurs since there are no columns named pfa16cd
or pfa16nm
in the data frame.
One other suggestion for you is to add a CITATION file so that citation("stats19")
can be used to generate a reference for citing it from R.
These all sound reasonable @adamhsparks. We'll get on it and hope to 'resubmit' by early next week. Note to self: add a link to show changes that have happened between now and then: https://help.github.com/articles/comparing-commits-across-time/
If it helps, here's the CITATION file for nasapower
that has a JOSS article and a manual (to give the release version) both in it as well as info on citing the POWER data properly.
https://github.com/ropensci/nasapower/blob/master/inst/CITATION
Thanks @adamhsparks! Timely followup from @Robinlovelace as always.
Great, thanks for showing us this example. I imagine the citation will change if/when it goes into JOSS.
Many thanks for providing follow-up comments, they have been really useful and led to further improvements in the package. All the changes in response to them can be found here: https://github.com/ITSLeeds/stats19/compare/3f38d4cd9144497a0884c1c49c06f680b674acb7...master
Specifically:
Use spelling::spell_check_package() and correct the errors
See https://github.com/ITSLeeds/stats19/commit/1d83af97d12c148d2ddd07d6753a0bebdfaff7e8
In the data.R file there are several examples that say something like: "The file names were generated as follows:" IMO any example like this needs to be in the data-raw in Rmd files and only the data in the final package with proper documentation as to what the data are and a link to where they were obtained from. A link to the data-raw files would be appropriate in the documentation for each data set.
The code to get police_boundaries
(now in data-raw
) has been updated to be more robust. It worked on my Ubuntu machine but suspect it failed on Mac. Maybe one for the sf
issue tracker if so. In any case here's a reprex from the new code that should work on any OS:
library(sf)
#> Linking to GEOS 3.7.0, GDAL 2.3.2, PROJ 5.2.0
u = "https://opendata.arcgis.com/datasets/3e5a096a8c7c456fb6d3164a3f44b005_3.geojson"
police_boundaries_wgs = sf::st_read(u)
#> Reading layer `3e5a096a8c7c456fb6d3164a3f44b005_3' from data source `https://opendata.arcgis.com/datasets/3e5a096a8c7c456fb6d3164a3f44b005_3.geojson' using driver `GeoJSON'
#> Simple feature collection with 43 features and 9 fields
#> geometry type: MULTIPOLYGON
#> dimension: XY
#> bbox: xmin: -6.402838 ymin: 49.86709 xmax: 1.761138 ymax: 55.81109
#> epsg (SRID): 4326
#> proj4string: +proj=longlat +datum=WGS84 +no_defs
names(police_boundaries_wgs)
#> [1] "objectid" "pfa16cd" "pfa16nm" "bng_e"
#> [5] "bng_n" "long" "lat" "st_areashape"
#> [9] "st_lengthshape" "geometry"
police_boundaries = st_transform(police_boundaries_wgs, 27700)
names(police_boundaries)
#> [1] "objectid" "pfa16cd" "pfa16nm" "bng_e"
#> [5] "bng_n" "long" "lat" "st_areashape"
#> [9] "st_lengthshape" "geometry"
police_boundaries = police_boundaries[c("pfa16cd", "pfa16nm")]
Created on 2019-01-09 by the reprex package (v0.2.1)
I think that's everything. Any further feedback, very welcome.
Thanks @Robinlovelace. Thanks @adamhsparks, @sckott and @daranzolin one more time from me.
Looks great! I've amended my review and recommend accepting this package to rOpenSci now.
Summary
The goal of stats19 is to make it easy to work with road crash data. Specifically it enables access to and processing of the UK’s official road traffic casualty database, which is called STATS19.
URL for the package (the development repository, not a stylized html page): https://github.com/ITSLeeds/stats19
Please indicate which category or categories from our package fit policies this package falls under *and why(? (e.g., data retrieval, reproducibility. If you are unsure, we suggest you make a pre-submission inquiry.):
data retrieval: downloads and formats publicly available road safety data that is difficult (impossible for most people) to use effectively without
Who is the target audience and what are scientific applications of this package?
Academic, industry and public sector researchers investigating road safety, specifically people wanting to perform natural experimental studies to find out how best to make the transport system safer. It will also be of interest for people interested in large point-on-network data for methodological applications.
Previous packages/code-bases:
code in the bikeR repo underlying an academic paper on collisions involving cyclists
functions in stplanr for downloading Stats19 data
updated functions related to the CyIPT project
but this package takes the best of the preceding packages and adds new features including:
automated and reproducible data formatting code based on DfT's official guidelines (before labels were hard-coded)
continuous integration
integration with sf (the stplanr implementation used sp)
inclusion of much more data (back to 1974 and until 2017, previous work only covered 1985:2015)
If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.
Requirements
Confirm each of the following by checking the box. This package:
Publication options
paper.md
matching JOSS's requirements with a high-level description in the package root or ininst/
.Detail
[x] Does
R CMD check
(ordevtools::check()
) succeed? Paste and describe any errors or warnings:[x] Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:
Currently the style uses
=
assignment, which differs from the recommended styleIf this is a resubmission following rejection, please explain the change in circumstances:
If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names: