Closed AAoritz closed 8 months ago
Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help
for help.
:rocket:
Editor check started
:wave:
git hash: 815f2a79
Important: All failing checks above must be addressed prior to proceeding
Package License: MIT + file LICENSE
The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.
|type |package | ncalls|
|:----------|:------------|------:|
|internal |utils | 26|
|internal |base | 25|
|imports |crayon | NA|
|imports |dplyr | NA|
|imports |stringr | NA|
|suggests |distill | NA|
|suggests |eurostat | NA|
|suggests |formatR | NA|
|suggests |ggalluvial | NA|
|suggests |ggfittext | NA|
|suggests |ggplot2 | NA|
|suggests |ggpubr | NA|
|suggests |ggrepel | NA|
|suggests |gridExtra | NA|
|suggests |kableExtra | NA|
|suggests |knitr | NA|
|suggests |raster | NA|
|suggests |RColorBrewer | NA|
|suggests |readr | NA|
|suggests |rmarkdown | NA|
|suggests |sf | NA|
|suggests |terra | NA|
|suggests |testthat | NA|
|suggests |tidyr | NA|
|linking_to |NA | NA|
Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(
data (26)
get (5), paste0 (5), c (4), names (4), attributes (2), by (2), missing (2), list (1)
utils
base
This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.
The package has: - code in R (100% in 7 files) and - 2 authors - 1 vignette - 4 internal data files - 3 imported packages - 3 exported functions (median 132 lines of code) - 3 non-exported functions in R (median 260 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by [the `checks_to_markdown()` function](https://docs.ropensci.org/pkgcheck/reference/checks_to_markdown.html) The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|------:|----------:|:----------| |files_R | 7| 45.7| | |files_vignettes | 2| 85.7| | |files_tests | 4| 79.0| | |loc_R | 602| 53.3| | |loc_vignettes | 874| 88.9| | |loc_tests | 702| 81.2| | |num_vignettes | 1| 64.8| | |data_size_total | 625034| 92.9| | |data_size_median | 156882| 93.1| | |n_fns_r | 6| 6.6| | |n_fns_r_exported | 3| 12.9| | |n_fns_r_not_exported | 3| 5.3| | |n_fns_per_file_r | 1| 0.2|TRUE | |num_params_per_fn | 6| 79.0| | |loc_per_fn_r | 193| 99.1|TRUE | |loc_per_fn_r_exp | 132| 94.8| | |loc_per_fn_r_not_exp | 260| 99.5|TRUE | |rel_whitespace_R | 18| 55.4| | |rel_whitespace_vignettes | 29| 89.2| | |rel_whitespace_tests | 12| 68.9| | |doclines_per_fn_exp | 60| 72.8| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 0| 0.0|TRUE | ---
Click to see the interactive network visualisation of calls between objects in package
goodpractice
and other checks#### 3a. Continuous Integration Badges [![R-CMD-check.yaml](https://github.com/AAoritz/nuts/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/AAoritz/nuts/actions) **GitHub Workflow Results** | id|name |conclusion |sha | run_number|date | |----------:|:----------------------------------------------|:----------|:------|----------:|:----------| | 7671209982|pages build and deployment with artifacts-next |success |4aab2b | 11|2024-01-26 | | 7671179570|pkgdown |success |815f2a | 14|2024-01-26 | | 7671179564|R-CMD-check |success |815f2a | 8|2024-01-26 | | 7671179577|test-coverage |success |815f2a | 12|2024-01-26 | --- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) R CMD check generated the following check_fail: 1. no_import_package_as_a_whole #### Test coverage with [covr](https://covr.r-lib.org/) Package coverage: 94.97 #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) The following functions have cyclocomplexity >= 15: function | cyclocomplexity --- | --- convert_nuts_level | 24 convert_nuts_version | 21 classify_nuts | 19 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found the following 184 potential issues: message | number of times --- | --- Avoid library() and require() calls in packages | 22 Lines should not be more than 80 characters. | 116 Use <-, not =, for assignment. | 46
|package |version | |:--------|:--------| |pkgstats |0.1.3.9 | |pkgcheck |0.1.2.11 |
Processing may not proceed until the items marked with :heavy_multiplication_x: have been resolved.
Hi rOpenSci Team and @ropensci-review-bot!
distill
. The bot prefers in check-vignette.R
rmarkdown
HTML files. Should we switch to rmarkdown
?Thank your for your time and consideration!
@AAoritz Sorry about the inconvenience there. We've updated our system to detect distill vignettes, so your package will now pass those tests. (The server might take a few days to incorporate those updates; please be paitent.)
Wow, thanks @mpadge! Looking forward to the review process!
Yep, thanks @mpadge. Will wait a bit of sending another check request to let those changes work through.
@AAoritz I think this looks good and is a fit for rOpenSci. I will start working on finding a handling editor for your submission!
Great news @jhollist! Thank you!
@ropensci-review-bot check package
Thanks, about to send the query.
:rocket:
Editor check started
:wave:
git hash: cf771ced
Package License: MIT + file LICENSE
The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.
|type |package | ncalls|
|:----------|:------------|------:|
|internal |utils | 26|
|internal |base | 25|
|imports |crayon | NA|
|imports |dplyr | NA|
|imports |stringr | NA|
|suggests |distill | NA|
|suggests |eurostat | NA|
|suggests |formatR | NA|
|suggests |ggalluvial | NA|
|suggests |ggfittext | NA|
|suggests |ggplot2 | NA|
|suggests |ggpubr | NA|
|suggests |ggrepel | NA|
|suggests |gridExtra | NA|
|suggests |kableExtra | NA|
|suggests |knitr | NA|
|suggests |raster | NA|
|suggests |RColorBrewer | NA|
|suggests |readr | NA|
|suggests |rmarkdown | NA|
|suggests |sf | NA|
|suggests |terra | NA|
|suggests |testthat | NA|
|suggests |tidyr | NA|
|linking_to |NA | NA|
Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(
data (26)
get (5), paste0 (5), c (4), names (4), attributes (2), by (2), missing (2), list (1)
utils
base
This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.
The package has: - code in R (100% in 7 files) and - 2 authors - 1 vignette - 4 internal data files - 3 imported packages - 3 exported functions (median 132 lines of code) - 3 non-exported functions in R (median 260 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by [the `checks_to_markdown()` function](https://docs.ropensci.org/pkgcheck/reference/checks_to_markdown.html) The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|------:|----------:|:----------| |files_R | 7| 45.7| | |files_vignettes | 2| 85.7| | |files_tests | 4| 79.0| | |loc_R | 602| 53.3| | |loc_vignettes | 874| 88.9| | |loc_tests | 702| 81.2| | |num_vignettes | 1| 64.8| | |data_size_total | 625034| 92.9| | |data_size_median | 156882| 93.1| | |n_fns_r | 6| 6.6| | |n_fns_r_exported | 3| 12.9| | |n_fns_r_not_exported | 3| 5.3| | |n_fns_per_file_r | 1| 0.2|TRUE | |num_params_per_fn | 6| 79.0| | |loc_per_fn_r | 193| 99.1|TRUE | |loc_per_fn_r_exp | 132| 94.8| | |loc_per_fn_r_not_exp | 260| 99.5|TRUE | |rel_whitespace_R | 18| 55.4| | |rel_whitespace_vignettes | 29| 89.2| | |rel_whitespace_tests | 12| 68.9| | |doclines_per_fn_exp | 60| 72.8| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 0| 0.0|TRUE | ---
Click to see the interactive network visualisation of calls between objects in package
goodpractice
and other checks#### 3a. Continuous Integration Badges [![R-CMD-check.yaml](https://github.com/AAoritz/nuts/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/AAoritz/nuts/actions) **GitHub Workflow Results** | id|name |conclusion |sha | run_number|date | |----------:|:----------------------------------------------|:----------|:------|----------:|:----------| | 7694130869|pages build and deployment with artifacts-next |success |38d719 | 13|2024-01-29 | | 7694097562|pkgdown |success |cf771c | 16|2024-01-29 | | 7694097559|R-CMD-check |success |cf771c | 10|2024-01-29 | | 7694097589|test-coverage |success |cf771c | 14|2024-01-29 | --- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) R CMD check generated the following check_fail: 1. no_import_package_as_a_whole #### Test coverage with [covr](https://covr.r-lib.org/) Package coverage: 94.97 #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) The following functions have cyclocomplexity >= 15: function | cyclocomplexity --- | --- convert_nuts_level | 24 convert_nuts_version | 21 classify_nuts | 19 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found the following 184 potential issues: message | number of times --- | --- Avoid library() and require() calls in packages | 22 Lines should not be more than 80 characters. | 116 Use <-, not =, for assignment. | 46
|package |version | |:--------|:--------| |pkgstats |0.1.3.9 | |pkgcheck |0.1.2.13 |
This package is in top shape and may be passed on to a handling editor
@AAoritz looking good! Assigning editor in just a few.
@ropensci-review-bot assign @maelle as editor
Assigned! @maelle is now the editor
Many thanks for your submission @AAoritz & @krausewe! Useful package! I know of the French equivalent to nuts, COGjugaison, so was vaguely aware of the idea https://antuki.github.io/COGugaison/ :smile_cat:
I have a few comments that I'd like to resolve or discuss before I proceed to the reviewer search.
In the CONTRIBUTING.md
file could you please remove the tidyverse specific bits such as " For more detailed info about contributing to this, and other tidyverse packages,"?
I was surprised to see optional dependencies before Imports in DESCRIPTION
. Would you mind running desc::desc_normalize()
and see whether you like the result?
Instead of remotes in the README you could recommend pak, pak::pak("AAoritz/nuts")
but in any case I don't think the force = TRUE
argument is warranted, is it?
Since there is only one vignette I'd recommend naming it "nuts.Rmd" so that it is automatically added in the pkgdown website navbar as "Get started". See https://pkgdown.r-lib.org/reference/build_articles.html#get-started
Your images have alternative text but that text is not complete, it does not describe the image so that a screenreader user would miss a lot. https://www.w3.org/WAI/tutorials/images/informative/
I was surprised by the short pipeline in https://github.com/AAoritz/nuts/blob/cf771ced02a20b226b56e6908959044ca62804e4/vignettes/nuts-vignette.Rmd#L247, could it avoid using the pipe, or if it does, could filter(
go on a separate line? I feel it's an unusual pattern (but I might be wrong). I see this pattern of one-line pipelines in the tests too so it might be a personal preference! It's more noticeable/important in the user facing code but I won't impose my personal style.
The reference index is short but it might make sense to group the functions then the datasets. https://pkgdown.r-lib.org/reference/build_reference.html#reference-index
output <- list(
data = data,
versions_data = data_all_versions,
missing_data = data_missing_nuts
)
so that then the user could use output [["missing_data"]]
instead of output[[3]]
which is less readable.
stop()
for error messages. The crayon package has been superseeded by cli (see https://cli.r-lib.org/ and https://blog.r-hub.io/2023/11/30/cliff-notes-about-cli/), according to https://cran.r-project.org/web/packages/crayon/index.html. You could keep using crayon (it's still supported) but it might make sense switching, as cli provides really nice features. Code likestop("Input 'nuts_code' must be provided as a string.")
could become
cli::cli_abort("Input {.arg nuts_code} must be provided as a string, not {.obj_type_friendly {nuts_code}}.")
(note that this function also requires importing rlang in DESCRIPTION)
and info messages could use cli_alert_
functions https://cli.r-lib.org/reference/cli_alert.html
Why does the package import whole packages rather than individual functions? Furthermore the imports should happen only once so they could happen in the package-level manual page source. You can create it with https://usethis.r-lib.org/reference/use_package_doc.html whose docs say "This .R file is also a good place for roxygen directives that apply to the whole package (vs. a specific function), such as global namespace tags like @importFrom."
In lines such as https://github.com/AAoritz/nuts/blob/cf771ced02a20b226b56e6908959044ca62804e4/R/convert_nuts_version.R#L128 the code might benefit from the use of "explaining variables", for instance (with a better name probably, again an example :sweat_smile: )
not_enough_codes <- (length(data$from_code[check_nuts_codes]) < length(data$from_code) &&
length(data$from_code[check_nuts_codes]) > 0)
if (not_enough_codes)
(oh and the use of && instead of & see https://lintr.r-lib.org/reference/vector_logic_linter.html)
You do not need data()
to load package data in tests.
I'd recommend not having top-level code in tests which means no code outside of test_that(
. This makes tests more self contained. So in a helper file, tests/testthat/helper-test-data.R
for instance you'd have the function
manure_indic_DE_2003 <- function() {
manure %>%
filter(nchar(geo) == 4) %>%
filter(indic_ag == "I07A_EQ_Y") %>%
select(-indic_ag) %>%
filter(grepl("^DE", geo)) %>%
filter(time == 2003) %>%
select(-time)
}
and in tests/testthat/test-classify_nuts.R
you'd have
test_that("Needs geo var 1", {
expect_error(
manure_indic_DE_2003() %>% classify_nuts(nuts_code = NULL),
"Input 'nuts_code' cannot be NULL."
)
})
test_that("Needs geo var 2", {
expect_error(manure_indic_DE_2003() %>% classify_nuts())
})
test_that("nuts_code not valid", {
expect_error(
manure_indic_DE_2003() %>% classify_nuts(nuts_code = 1),
"Input 'nuts_code' must be provided as a string."
)
})
See https://blog.r-hub.io/2020/11/18/testthat-utility-belt/ and https://r-pkgs.org/testing-design.html + https://r-pkgs.org/testing-advanced.html
set.seed()
, in tests please use the withr version: https://withr.r-lib.org/reference/with_seed.htmlHappy to answer any question and to discuss! Thanks again for submitting your package!
Hi @maelle ! Many thanks for these super useful checks and comments. @krausewe and I start looking into them now and we will get back to you soon.
Hi @maelle!
We have addressed the very helpful remarks & recommendations.
Please find below the list of commits.
We are looking forward to the next steps!
CONTRIBUTING
: https://github.com/AAoritz/nuts/commit/f5c4a5a4049c52c9f34dd253d99a0eb2515af164DESCRIPTION
: https://github.com/AAoritz/nuts/commit/b80c8db0ad253a631b6b06f8fa20a61523c0d218pak
in README
: https://github.com/AAoritz/nuts/commit/51c2e44addb5b5052421b967cb9ddc11837fb6afnuts.Rmd
: https://github.com/AAoritz/nuts/commit/3fe96c87ece74a6e28d35f350302d282523238e8pkgdown
: https://github.com/AAoritz/nuts/commit/f11a08070d0c1356339936c742165cfe6a551c34classify
: https://github.com/AAoritz/nuts/commit/a5bbea99ace46bd32be7bc0628d239ebe0f319f4cli
messages: https://github.com/AAoritz/nuts/commit/8cb9be6e0fa526f1fc43cbaeb8ab5b3bec89c8e8&&
: https://github.com/AAoritz/nuts/commit/4bc0303f8b2be824ac554d31cb772ac27d242140data()
: https://github.com/AAoritz/nuts/commit/e2ba58f62169b8c84b33817faf305dfc6aeb6c09 set.seed()
with withr
: https://github.com/AAoritz/nuts/commit/0d9e3ad4084780c69179f52afd21c30b1eedba6cAwesome, thank you! I'll now look for reviewers.
Note that @mpadge's and my tech note is now published: https://ropensci.org/blog/2024/02/06/verbosity-control-packages/
@ropensci-review-bot seeking reviewers
Please add this badge to the README of your package repository:
[![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/623_status.svg)](https://github.com/ropensci/software-review/issues/623)
Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news
@ropensci-review-bot.
Added Peer Review Status.
NEWS.md
was already created and included.
@ropensci-review-bot add @nolwenn to reviewers
@nolwenn added to the reviewers list. Review due date is 2024-03-01. Thanks @nolwenn for accepting to review! Please refer to our reviewer guide.
rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.
@nolwenn: If you haven't done so, please fill this form for us to update our reviewers records.
Hi @maelle! We are a bit lost with implementing verbosity control with local_options
. This example from your blog does not seem to work with cli
:
> pkg_message <- function(...) {
+ is_verbose_mode <- (getOption("mypackage.verbose", "quiet") == "verbose")
+ if (is_verbose_mode) {
+ # Options local to this function only; reset on exit!
+ rlang::local_options(rlib_message_verbosity = "verbose")
+ }
+ rlang::inform(...)
+ }
> pkg_message("normal message")
normal message
> rlang::local_options(rlib_message_verbosity = "quiet")
Setting global deferred event(s).
i These will be run:
* Automatically, when the R session ends.
* On demand, if you call `withr::deferred_run()`.
i Use `withr::deferred_clear()` to clear them without executing.
> pkg_message("suppressed message")
> rlang::local_options(mypackage.verbose = "verbose")
> pkg_message("reawakened message")
reawakened message
> withr::deferred_run()
Ran 2/2 deferred expressions
> pkg_message <- function(...) {
+ is_verbose_mode <- (getOption("mypackage.verbose", "quiet") == "verbose")
+ if (is_verbose_mode) {
+ # Options local to this function only; reset on exit!
+ rlang::local_options(rlib_message_verbosity = "verbose")
+ }
+ cli::cli_h1(...)
+ }
> pkg_message("normal message")
── normal message ───────────────────────────────────────────────────────────────────────────────────────────────
> rlang::local_options(rlib_message_verbosity = "quiet")
Setting global deferred event(s).
i These will be run:
* Automatically, when the R session ends.
* On demand, if you call `withr::deferred_run()`.
i Use `withr::deferred_clear()` to clear them without executing.
> pkg_message("suppressed message")
── suppressed message ───────────────────────────────────────────────────────────────────────────────────────────
> rlang::local_options(mypackage.verbose = "verbose")
> pkg_message("reawakened message")
── reawakened message ───────────────────────────────────────────────────────────────────────────────────────────
>
Right, it seems specific to rlang (cc @mpadge).
How about something simpler like
pkg_message <- function(...) {
is_verbose_mode <- (getOption("mypackage.verbose", "quiet") == "verbose")
if (is_verbose_mode) {
cli::cli_alert(...)
}
}
withr::with_options(list("mypackage.verbose" = "quiet"), {
pkg_message("pof")
})
withr::with_options(list("mypackage.verbose" = "verbose"), {
pkg_message("pof")
})
#> → pof
Created on 2024-02-13 with reprex v2.1.0
:thinking:
@AAoritz The options we wrote about in the blog post only apply to the cli_inform()
, cli_warn()
, and cli_abort()
functions, not to any other cli
functions. If you want to use others such as cli_h<N>()
, you'd need to further adapt the custom handler to not call that at all if verbose == "quiet"
.
Thank you @maelle and @mpadge! I wrapped our cli
message as suggested in https://github.com/AAoritz/nuts/commit/47ee441ea6ea98b30326157b803f81dba5557b42 by @maelle. Here is the result:
rlang::local_options(nuts.verbose = "quiet")
df <- patents %>%
filter(unit == "NR", nchar(geo) == 4, time == 2012) %>%
filter(grepl("^DE", geo)) %>%
classify_nuts(data = ., nuts_code = "geo")
withr::deferred_run()
#> Ran 1/1 deferred expressions
rlang::local_options(nuts.verbose = "verbose")
df <- patents %>%
filter(unit == "NR", nchar(geo) == 4, time == 2012) %>%
filter(grepl("^DE", geo)) %>%
classify_nuts(data = ., nuts_code = "geo")
#>
#> ── Classifying version of NUTS codes ───────────────────────────────────────────
#> Within groups defined by country:
#> ! These NUTS codes cannot be identified or classified: DEXX and DEZZ.
#> ✔ Unique NUTS version classified.
#> ✔ No missing NUTS codes.
withr::deferred_run()
#> Ran 2/2 deferred expressions
rlang::local_options(rlib_message_verbosity = "quiet")
df <- patents %>%
filter(unit == "NR", nchar(geo) == 4, time == 2012) %>%
filter(grepl("^DE", geo)) %>%
classify_nuts(data = ., nuts_code = "geo")
Created on 2024-02-14 with reprex v2.0.2
How would you recommend to document this option? Below the examples in the function level help/documentation?
@AAoritz Would you please be so kind as to ask that question on our discussion forum? The issue of where to document verbosity control is important, and would be good to have in that general context, rather than here.
@ropensci-review-bot add @jospueyo to reviewers
@jospueyo added to the reviewers list. Review due date is 2024-03-07. Thanks @jospueyo for accepting to review! Please refer to our reviewer guide.
rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.
@jospueyo: If you haven't done so, please fill this form for us to update our reviewers records.
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).Estimated hours spent reviewing: 3 h
The package is useful, very useful indeed, and also well designed. Most of my comments below are just suggestions that IMHO will make the package more intuitive, more robust, or more easily maintainable. Don't take them as mandatory. At the end, this is your package and you should have the last word.
Disclaimer: Apologies if some comments sound though or rude. English is not my native language.
Function naming does not follow rOpenSci package guidelines. Specifically, the design object_verb()
. Please, start your exported functions with nuts_
, this also facilitates autocompletion.
Variables in vignettes should use underscores and not points, that can clash with S3 methods.
group_vars
argument could be changed to .by
, to be consinstent with dplyr
functions.
convert_nuts_levels
might be changed to nuts_aggregate
to be explicit about that the level change is only aggregation and not disaggregation.
Provide wrapper functions to inspect sub-elements of classifiy_nuts
, something like:
nuts_get_data()
nuts_get_version()
nuts_get_missing()
missing_rm
could also provide some imputation technique, such as using the mean. In the future, it might be linked to imputation packages such as mice
, for instance.
There are some inconsistencies in your constructions that can make the package harder to mantain, especially if there is a maintainer change in the future. For instance, sometimes you are explicit inside your conditions: if(condition == FALSE) do something...
vs if(!condition) do something...
. You should check the constructions you used and try to be consistent. BTW, you should avoid using T
and F
, replace them with TRUE
and FALSE
.
You should not supply default values in mandatory arguments. Otherwise, if you run the function without providing these values, it raises a cryptographic error instead of the usual Error in classify_nuts() : argument "data" is missing, with no default
.
When an argument accepts a set of values, such asmultiple_versions
or weights
, it’s preferable to pass a vector with all options as default to show all possible values. E.g. multiple_versions = c('break', 'most_frequent')
. Then, you use the first element, which would be break
by default. BTW, break
is a concept that not anybody might be familiar with. Maybe, using error
would be more intuitive.
When you check if an argument is one of the accepted values, you should contemplate that users could pass a vector of length > 1 (never underestimate end-users). If this happens, an error is given because if()
can only accept 1-length vectors. You should do something like if(!all(nuts_code %in% colnames(data)))
.
The “nuts.verbose” options should be mentioned in the help of each function. Otherwise, users can’t know what options they have about verbosity.
In the table about converting relative values, some *
are interpreted by markdown as italics. This makes impossible to interpret the weighted average formulas. You should fix it, possibly converting to equations or to code. You will have to try.
All the scoped verbs (*_at
) were superseeded in current versions of dplyr. Replace with new forms such as summarize(across())
.
The pipe from magrittr
and as_tibble
from tibble
package can be imported from dplyr
and you can remove two packages from imports. You will depend on tibble and magrittr anyway, but not directly but trough dplyr
.
Likewise, if you don’t want to depend on stringr
, you can easily replace str_remove
with gsub()
, passing ""
as replacement; and str_detect
with grepl
.
classify_nuts
(suggested name: nuts_classify
)class(x) <- c("new_class", class(x)
. Otherwise, if the object is no longer a list and only your custom class, this could break some methods for lists and raise unexpected behavior if the output of classify_nuts
is used by something else than feed your other functions.convert_nuts_version
(suggested name: nuts_convert_version
)L72: There is inherits(object, class)
to check a class. It’s more robust than class(x) == "your_class"
.
L183: You could do the subset just after the get()
. This way, you save loading the entire table in memory: cross_walks <- get("cross_walks")[cross_walks$to_version == to_version, ]
.
convert_nuts_level
(suggested name: nuts_aggregate
)to_level
can be set to 3 if it’s not possible convert from level 1 or 2 to 3? You should add this in the description. My suggestion of name also makes this clearer.L151: This chunk of code is repeated in convert_nuts_version
. This is complex enough to avoid repeating yourself. I suggest moving all this chunk to a new function and call this function in both. This makes you code more maintainable.
Looking forward your responses!
Thanks a lot @jospueyo! :pray:
Could you please add the number of hours you spent reviewing near "Estimated hours spent reviewing:" in the template? Thanks!
Thanks so much @jospueyo! All of this is super helpful.
We will start addressing the recommendations asap!
@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/623#issuecomment-1951446662 time 3
Logged review for jospueyo (hours: 3)
Briefly describe any working relationship you have (had) with the package authors.
The package includes all the following forms of documentation:
[x] A statement of need: clearly stating problems the software is designed to solve and its target audience in README
[ ] Installation instructions: for the development version of package and any non-standard dependencies in README
[ ] Vignette(s): demonstrating major functionality that runs successfully locally
[x] Function Documentation: for all exported functions
[x] Examples: (that run successfully locally) for all exported functions
[x] Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).
#### Functionality
Estimated hours spent reviewing: 2
The package could be really useful for many applications where spatial-temporal question is involved. At the time of my review, the authors have already considered some of the comments of reviewer 1 so you will excuse me if some of my comments might be outdated.
you have a error, an extra "/" . It should be r pak::pak("AAoritz/nuts")
and not r pak::pak("AAoritz/nuts/")
At first load I had the message:
Error: package or namespace load failed for ‘nuts’ in loadNamespace(j \<- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]): namespace ‘vctrs’ 0.6.3 is already loaded, but >= 0.6.4 is required
I had to close and reopen R/Rstudio then it worked fine.
The Rmd file has been updated with the previous recommendations by @jospueyo concerning the naming of the function (which is good) but not the HTML version https://aaoritz.github.io/nuts/articles/nuts-vignette.html. The vignette in HTML version need to be rebuilt and push for online display.
In the vignette, the author did not explicitly call the libraries required for running the data management needed for the examples like the meta-library tidyverse with dplyr for the filter() functions and other data management functions.
In the vignette, maps are displayed with various NUTS over time. Even so it is not the goal of the nuts package, a note mentioning which package(s) (or software) was(were) used to create those maps could be of interest for the reader.
The function nuts_test_multiple_versions do not have an examples. Would it be possible to have at least one?
After a nuts_convert_version(), it might be interesting to have a helping function that computes the estimated variations of the values between the different NUTS versions. It could be the difference (increase or decrease) in absolute or relative frequency at the higher NUT level. Like to get the 7.7% that you estimated in your first manure example (Spatial interpolation in a nutshell)
Hope it helps,
Best, Nolwenn
Amazing, @nolwenn. Thank you for the feedback!
(And sorry for already starting to revise the package while you were still reviewing.)
Thanks a lot @nolwenn for your review! :pray:
@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/623#issuecomment-1961501137 time 2
Logged review for nolwenn (hours: 2)
@AAoritz @krausewe both reviews are in! :tada:
Many thanks @jospueyo for your time and effort! We highly appreciate your thorough and constructive feedback. The package greatly improved from your comments. To summarize our reply, we have no objections and adopted all comments, except for 3 points below that may require further discussion.
General comments
Function naming: https://github.com/AAoritz/nuts/commit/2e0f4c37a6005af8d343aeb85c8144e1fffa2de4
Variables in vignettes should use underscores: https://github.com/AAoritz/nuts/commit/70b2dec3bd2fa5651757975b2ea07b2738cbb368
convert_nuts_levels might be changed to nuts_aggregate: https://github.com/AAoritz/nuts/commit/2e0f4c37a6005af8d343aeb85c8144e1fffa2de4
Provide wrapper functions for classifiy_nuts: https://github.com/AAoritz/nuts/commit/f3db2b79da2d134c37002e0d5038e9e43fb85b36
Be consistent inside your conditions: if(condition == FALSE) do something... vs if(!condition) do something.... and avoid using T and F, replace them with TRUE and FALSE: https://github.com/AAoritz/nuts/commit/abf45e4531fde33b5b7878f6476830b1180165d4
You should not supply default values in mandatory arguments: https://github.com/AAoritz/nuts/commit/474c89d64e2721f8b9d45313c8e5230991dd23c1
When an argument accepts a set of values, such as multiple_versions or weights, it’s preferable to pass a vector with all options as default to show all possible values: https://github.com/AAoritz/nuts/commit/501347840b685a9dc3fa5e409b2fa1bf6dd56372
The “nuts.verbose” options should be mentioned in the help of each function: https://github.com/AAoritz/nuts/commit/94accd4a469472183bbfce1a604444d28bbd89e3
Documentation
- In the table about converting relative values, some * are interpreted by markdown as italics: https://github.com/AAoritz/nuts/commit/7e643c700b25ad04bb1f60a5f10124d7bd35a0e8
Imports
All the scoped verbs (*_at) were superseeded in current versions of dplyr. Replace with new forms such as summarize(across()): https://github.com/AAoritz/nuts/commit/ce238d60e57fc209882d8846fd910d2dba8afb38
The pipe from magrittr and as_tibble from tibble package can be imported from dplyr and you can remove two packages from imports: https://github.com/AAoritz/nuts/commit/ed2ae3ee5e1ce0c38bb1973e234edda8d31fe54c
Likewise, if you don’t want to depend on stringr, you can easily replace str_remove with gsub(), passing "" as replacement; and str_detect with grepl: https://github.com/AAoritz/nuts/commit/3ef29c1b859d3b3f99fb4ec2388fc85c5bf39c75
classify_nuts (suggested name: nuts_classify)
- L310, you should add the class “nuts.classified” instead of overwriting the class “list”: class(x) <- c("new_class", class(x). Otherwise, if the object is no longer a list and only your custom class, this could break some methods for lists and raise unexpected behavior if the output of classify_nuts is used by something else than feed your other functions: https://github.com/AAoritz/nuts/commit/7d7b2b3ee3bff70667cb625c5b969d389619d918
convert_nuts_version (suggested name: nuts_convert_version)
L72: There is inherits(object, class) to check a class. It’s more robust than class(x) == "your_class": https://github.com/AAoritz/nuts/commit/a6d88cd2c2606514367a1a90fa1b9226c3803a1e
L183: You could do the subset just after the get(). This way, you save loading the entire table in memory: cross_walks <- get("cross_walks")[cross_walks$to_version == to_version, ]: https://github.com/AAoritz/nuts/commit/3a8749156bcd474bcc8d40a90c79739b1c739044
convert_nuts_level (suggested name: nuts_aggregate)
In the vignettes, you explain that only aggregation is possible, which makes sense. However, this is not reflected in the help page: https://github.com/AAoritz/nuts/commit/21ad775689eabbb3eb86c3063754f08e2dd06263
L151: This chunk of code is repeated in convert_nuts_version. This is complex enough to avoid repeating yourself. I suggest moving all this chunk to a new function and call this function in both. This makes you code more maintainable: https://github.com/AAoritz/nuts/commit/38f8e17de84b67edbeacf25b747ce4322147f94b
- When you check if an argument is one of the accepted values, you should contemplate that users could pass a vector of length > 1 (never underestimate end-users). If this happens, an error is given because if() can only accept 1-length vectors. You should do something like if(!all(nuts_code %in% colnames(data))):
We implemented specific checks for this case when length(argument) > 1 for single length arguments. https://github.com/AAoritz/nuts/commit/f052aa4ffd95bc29b8e5a46c6c69fcb123341db9
- group_vars argument could be changed to .by:
This is an interesting development in dplyr
. It does circumvent the danger of forgetting to ungroup()
. It is still marked as experimental though. Would it make sense to wait with this? https://github.com/tidyverse/dplyr/blob/HEAD/R/by.R
- missing_rm could also provide some imputation technique, such as using the mean. In the future, it might be linked to imputation packages such as mice:
This is a very interesting suggestion. However, we tend to think of imputation as a separate process apart from conversion/aggregation. Currently, imputation with the mice
package could be done with joining a subset of data(all_nuts_codes)
to the user's dataset and imputing across the universe of NUTS regions. Moreover, imputation as with the method used by the mice
package (Van Buuren, S. (2018). Flexible Imputation of Missing Data) comes with many different assumptions whose legitimacy are highly context dependent. We would currently be more comfortable to leave this task to the users. We do have some ideas to deal with missing observations. Please see our discussion with @nolwenn below.
Thank you for your great input @nolwenn! Those commments are really valuable for this package. We adopted all of your suggestions with the exception of two points that we would like to discuss further.
It should be r pak::pak("AAoritz/nuts") and not r pak::pak("AAoritz/nuts/"): https://github.com/AAoritz/nuts/commit/655460e1af7af0e6ac730c4802226b9613bcca0e
The vignette in HTML version need to be rebuilt and push for online display: https://github.com/AAoritz/nuts/commit/764b0f4381e91303aa4dc401ac3e44d8ce9ddff8
In the vignette, the author did not explicitly call the libraries required for running the data management needed for the examples like the meta-library tidyverse with dplyr for the filter() functions and other data management functions: https://github.com/AAoritz/nuts/commit/ebbb2b8752341c2fbb53eae0e704e1123f8e27dc
In the vignette, maps are displayed with various NUTS over time. Even so it is not the goal of the nuts package, a note mentioning which package(s) (or software) was(were) used to create those maps could be of interest for the reader: https://github.com/AAoritz/nuts/commit/8a1d3dda0af1880a8aa297ae40b6f27c83e0f1ec
The function nuts_test_multiple_versions do not have an examples: https://github.com/AAoritz/nuts/commit/813ad7c006f85232d27b7e17d09061f62d808dba
- Error: package or namespace load failed for ‘nuts’ in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]): namespace ‘vctrs’ 0.6.3 is already loaded, but >= 0.6.4 is required:
We currently don't see where this comes from. Does this error persist with the most recent version?
- After a
nuts_convert_version()
, it might be interesting to have a helping function that computes the estimated variations of the values between the different NUTS versions. It could be the difference (increase or decrease) in absolute or relative frequency at the higher NUT level. Like to get the 7.7% that you estimated in your first manure example (Spatial interpolation in a nutshell)
This is an interesting proposal. If we understand your point correctly, we could include an option in nuts_convert_version()
that could return e.g. a column in the converted dataset containing more details on the conversion. It could for instance be a string that pastes the path how a NUTS region was converted similar to our explanation of the methodology in the Vignette. For instance, in the case where a region ZZ72
is created from regions ZZ68
and ZZ70
, the converted data set would have a column nuts_code
equal to ZZ72
and a new variable conversion_path
equal to 0.6 * ZZ68 + 0.2*ZZ70
. Currently, this type of information could be extracted, by sub-setting and transforming the tibble data(cross_walks)
.
Along similar lines, we have thought about an option that may report/or make use of the share of conversion/aggregation weights missing due to missing regions. For instance, when region ZZ72
is created mainly from ZZ68
and only marginally from ZZ70
, where the latter is missing, e.g. ZZ72 = 0.8 * ZZ68 + 0.01 * NA
, setting ZZ70
to 0
by using missing_rm = T
would be less dramatic as when the weight placed on ZZ70
was larger. Computing the share of weights missing (in this example 0.01 / 0.81
), we could change the missing_rm
option to a threshold at which the users feel confident to assume 0 for the missing regions (e.g. missing_rm > 0.02
).
#
Many thanks! @krausewe and I are looking forward to both of your replies!
@AAoritz @krausewe thank you! Could you please record your response for the bot? https://devdevguide.netlify.app/bot_cheatsheet#submit-response-to-reviewers
@jospueyo @nolwenn could you please read the response of the authors and respond? Once you are happy with the changes, please use the approval template https://devdevguide.netlify.app/approval2template
This has been awesome. I enjoyed reviewing your package and monitoring its evolution. I hope you are truly happy with my suggestions, I am with your responses.
Regarding .by
argument, I didn't know it was still experimental. Otherwise, I wouldn't recommend the change.
I also agree with your opinion about imputation of missing values.
Estimated hours spent reviewing: 3.5
@ropensci-review-bot submit response https://github.com/ropensci/software-review/issues/623#issuecomment-1976920666
Logged author response!
@AAoritz, @krausewe: please post your response with @ropensci-review-bot submit response <url to issue comment>
if you haven't done so already (this is an automatic reminder).
Here's the author guide for response. https://devguide.ropensci.org/authors-guide.html
Date accepted: 2024-03-14 Submitting Author Name: Moritz Hennicke Submitting Author Github Handle: !--author1-->@AAoritz<!--end-author1-- Other Package Authors Github handles: !--author-others-->@krausewe<!--end-author-others-- Repository: https://github.com/AAoritz/nuts/ Version submitted: 0.0.0.9000 Submission type: Standard Editor: !--editor-->@maelle<!--end-editor-- Reviewers: @nolwenn, @jospueyo
Archive: TBD Version accepted: TBD Language: en
Scope
Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
Explain how and why the package falls under these categories (briefly, 1-2 sentences):
Data munging: Linking regional data from different sources at the level of NUTS codes is often complicated by the usage of different versions or varying levels of geographical granularity across sources. The package can be used to harmonize NUTS versions and levels across different data sources using spatial interpolation.
Data validation and testing: The package includes routine tasks to test for the validity and completeness of NUTS codes.
Geospatial data: NUTS codes are the dominant format for European regional data.
The target audience are academics, journalists and data scientists interested in European regional data. Users who want to exploit changes within NUTS regions over time face the challenge that administrative boundaries are redrawn over time. The package enables the construction of consistent panel data across NUTS regions and over time through the harmonization of NUTS regions to one common version or level of granularity.
To our knowledge there is currently no package that is targeted at the conversion of NUTS versions using spatial interpolation.
The
regions
package allows for re-coding of NUTS codes from version to version without spatial interpolation. The package offers some code validation routines, but not an automated detection of the NUTS version used.Yes
If you made a pre-submission inquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.
Explain reasons for any
pkgcheck
items which your package is unable to pass.Technical checks
Confirm each of the following by checking the box.
This package:
Publication options
Code of conduct