ropensci / software-review

rOpenSci Software Peer Review.
286 stars 104 forks source link

'nuts': Convert European Regional Data #623

Closed AAoritz closed 3 months ago

AAoritz commented 5 months ago

Date accepted: 2024-03-14 Submitting Author Name: Moritz Hennicke Submitting Author Github Handle: !--author1-->@AAoritz<!--end-author1-- Other Package Authors Github handles: !--author-others-->@krausewe<!--end-author-others-- Repository: https://github.com/AAoritz/nuts/ Version submitted: 0.0.0.9000 Submission type: Standard Editor: !--editor-->@maelle<!--end-editor-- Reviewers: @nolwenn, @jospueyo

Archive: TBD Version accepted: TBD Language: en


Package: nuts
Title: 'nuts': Convert European Regional Data
Version: 0.0.0.9000
Authors@R: c(person("Moritz", "Hennicke", email = "AAoritz@posteo.de", role = c("aut", "cre"),
           comment = c(ORCID = "0000-0001-6811-1821")),
         person("Werner", "Krause", email = "werner.krause@uni-potsdam.de", role = c("aut"),
           comment = c(ORCID = "0000-0002-5069-7964")))
Description: Motivated by changing administrative boundaries over time, the 'nuts' package can convert European regional data with NUTS codes between versions (2006, 2010, 2013, 2016 and 2021) and levels (NUTS 1, NUTS 2 and NUTS 3). The package uses spatial interpolation based on granular (100mx100m) population and land use data provided by the European Commission's Joint Research Center.  
URL: https://AAoritz.github.io/nuts/
BugReports: https://github.com/AAoritz/nuts/
License: MIT + file LICENSE
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.1
Suggests: 
    distill,
    eurostat,
    formatR,
    ggalluvial,
    ggfittext,
    ggplot2,
    ggpubr,
    ggrepel,
    gridExtra,
    kableExtra,
    knitr,
    raster,
    RColorBrewer,
    readr,
    rmarkdown,
    sf,
    terra,
    testthat,
    tidyr
Config/testthat/edition: 3
VignetteBuilder: knitr
Imports:
    crayon,
    dplyr,
    stringr
Depends: 
    R (>= 3.5.0)
LazyData: true

Scope

Data munging: Linking regional data from different sources at the level of NUTS codes is often complicated by the usage of different versions or varying levels of geographical granularity across sources. The package can be used to harmonize NUTS versions and levels across different data sources using spatial interpolation.

Data validation and testing: The package includes routine tasks to test for the validity and completeness of NUTS codes.

Geospatial data: NUTS codes are the dominant format for European regional data.

The target audience are academics, journalists and data scientists interested in European regional data. Users who want to exploit changes within NUTS regions over time face the challenge that administrative boundaries are redrawn over time. The package enables the construction of consistent panel data across NUTS regions and over time through the harmonization of NUTS regions to one common version or level of granularity.

To our knowledge there is currently no package that is targeted at the conversion of NUTS versions using spatial interpolation.

The regions package allows for re-coding of NUTS codes from version to version without spatial interpolation. The package offers some code validation routines, but not an automated detection of the NUTS version used.

Yes

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

Code of conduct

ropensci-review-bot commented 5 months ago

Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help for help.

ropensci-review-bot commented 5 months ago

:rocket:

Editor check started

:wave:

ropensci-review-bot commented 5 months ago

Checks for nuts (v0.0.0.9000)

git hash: 815f2a79

Important: All failing checks above must be addressed prior to proceeding

Package License: MIT + file LICENSE


1. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate. |type |package | ncalls| |:----------|:------------|------:| |internal |utils | 26| |internal |base | 25| |imports |crayon | NA| |imports |dplyr | NA| |imports |stringr | NA| |suggests |distill | NA| |suggests |eurostat | NA| |suggests |formatR | NA| |suggests |ggalluvial | NA| |suggests |ggfittext | NA| |suggests |ggplot2 | NA| |suggests |ggpubr | NA| |suggests |ggrepel | NA| |suggests |gridExtra | NA| |suggests |kableExtra | NA| |suggests |knitr | NA| |suggests |raster | NA| |suggests |RColorBrewer | NA| |suggests |readr | NA| |suggests |rmarkdown | NA| |suggests |sf | NA| |suggests |terra | NA| |suggests |testthat | NA| |suggests |tidyr | NA| |linking_to |NA | NA| Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats()', and examining the 'external_calls' table.

utils

data (26)

base

get (5), paste0 (5), c (4), names (4), attributes (2), by (2), missing (2), list (1)

**NOTE:** No imported packages appear to have associated function calls; please ensure with author that these 'Imports' are listed appropriately.


2. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has: - code in R (100% in 7 files) and - 2 authors - 1 vignette - 4 internal data files - 3 imported packages - 3 exported functions (median 132 lines of code) - 3 non-exported functions in R (median 260 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by [the `checks_to_markdown()` function](https://docs.ropensci.org/pkgcheck/reference/checks_to_markdown.html) The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|------:|----------:|:----------| |files_R | 7| 45.7| | |files_vignettes | 2| 85.7| | |files_tests | 4| 79.0| | |loc_R | 602| 53.3| | |loc_vignettes | 874| 88.9| | |loc_tests | 702| 81.2| | |num_vignettes | 1| 64.8| | |data_size_total | 625034| 92.9| | |data_size_median | 156882| 93.1| | |n_fns_r | 6| 6.6| | |n_fns_r_exported | 3| 12.9| | |n_fns_r_not_exported | 3| 5.3| | |n_fns_per_file_r | 1| 0.2|TRUE | |num_params_per_fn | 6| 79.0| | |loc_per_fn_r | 193| 99.1|TRUE | |loc_per_fn_r_exp | 132| 94.8| | |loc_per_fn_r_not_exp | 260| 99.5|TRUE | |rel_whitespace_R | 18| 55.4| | |rel_whitespace_vignettes | 29| 89.2| | |rel_whitespace_tests | 12| 68.9| | |doclines_per_fn_exp | 60| 72.8| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 0| 0.0|TRUE | ---

2a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package


3. goodpractice and other checks

Details of goodpractice checks (click to open)

#### 3a. Continuous Integration Badges [![R-CMD-check.yaml](https://github.com/AAoritz/nuts/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/AAoritz/nuts/actions) **GitHub Workflow Results** | id|name |conclusion |sha | run_number|date | |----------:|:----------------------------------------------|:----------|:------|----------:|:----------| | 7671209982|pages build and deployment with artifacts-next |success |4aab2b | 11|2024-01-26 | | 7671179570|pkgdown |success |815f2a | 14|2024-01-26 | | 7671179564|R-CMD-check |success |815f2a | 8|2024-01-26 | | 7671179577|test-coverage |success |815f2a | 12|2024-01-26 | --- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) R CMD check generated the following check_fail: 1. no_import_package_as_a_whole #### Test coverage with [covr](https://covr.r-lib.org/) Package coverage: 94.97 #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) The following functions have cyclocomplexity >= 15: function | cyclocomplexity --- | --- convert_nuts_level | 24 convert_nuts_version | 21 classify_nuts | 19 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found the following 184 potential issues: message | number of times --- | --- Avoid library() and require() calls in packages | 22 Lines should not be more than 80 characters. | 116 Use <-, not =, for assignment. | 46


Package Versions

|package |version | |:--------|:--------| |pkgstats |0.1.3.9 | |pkgcheck |0.1.2.11 |


Editor-in-Chief Instructions:

Processing may not proceed until the items marked with :heavy_multiplication_x: have been resolved.

AAoritz commented 5 months ago

Hi rOpenSci Team and @ropensci-review-bot!

Thank your for your time and consideration!

mpadge commented 5 months ago

@AAoritz Sorry about the inconvenience there. We've updated our system to detect distill vignettes, so your package will now pass those tests. (The server might take a few days to incorporate those updates; please be paitent.)

AAoritz commented 5 months ago

Wow, thanks @mpadge! Looking forward to the review process!

jhollist commented 5 months ago

Yep, thanks @mpadge. Will wait a bit of sending another check request to let those changes work through.

@AAoritz I think this looks good and is a fit for rOpenSci. I will start working on finding a handling editor for your submission!

AAoritz commented 5 months ago

Great news @jhollist! Thank you!

jhollist commented 5 months ago

@ropensci-review-bot check package

ropensci-review-bot commented 5 months ago

Thanks, about to send the query.

ropensci-review-bot commented 5 months ago

:rocket:

Editor check started

:wave:

ropensci-review-bot commented 5 months ago

Checks for nuts (v0.0.0.9000)

git hash: cf771ced

Package License: MIT + file LICENSE


1. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate. |type |package | ncalls| |:----------|:------------|------:| |internal |utils | 26| |internal |base | 25| |imports |crayon | NA| |imports |dplyr | NA| |imports |stringr | NA| |suggests |distill | NA| |suggests |eurostat | NA| |suggests |formatR | NA| |suggests |ggalluvial | NA| |suggests |ggfittext | NA| |suggests |ggplot2 | NA| |suggests |ggpubr | NA| |suggests |ggrepel | NA| |suggests |gridExtra | NA| |suggests |kableExtra | NA| |suggests |knitr | NA| |suggests |raster | NA| |suggests |RColorBrewer | NA| |suggests |readr | NA| |suggests |rmarkdown | NA| |suggests |sf | NA| |suggests |terra | NA| |suggests |testthat | NA| |suggests |tidyr | NA| |linking_to |NA | NA| Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats()', and examining the 'external_calls' table.

utils

data (26)

base

get (5), paste0 (5), c (4), names (4), attributes (2), by (2), missing (2), list (1)

**NOTE:** No imported packages appear to have associated function calls; please ensure with author that these 'Imports' are listed appropriately.


2. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has: - code in R (100% in 7 files) and - 2 authors - 1 vignette - 4 internal data files - 3 imported packages - 3 exported functions (median 132 lines of code) - 3 non-exported functions in R (median 260 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by [the `checks_to_markdown()` function](https://docs.ropensci.org/pkgcheck/reference/checks_to_markdown.html) The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|------:|----------:|:----------| |files_R | 7| 45.7| | |files_vignettes | 2| 85.7| | |files_tests | 4| 79.0| | |loc_R | 602| 53.3| | |loc_vignettes | 874| 88.9| | |loc_tests | 702| 81.2| | |num_vignettes | 1| 64.8| | |data_size_total | 625034| 92.9| | |data_size_median | 156882| 93.1| | |n_fns_r | 6| 6.6| | |n_fns_r_exported | 3| 12.9| | |n_fns_r_not_exported | 3| 5.3| | |n_fns_per_file_r | 1| 0.2|TRUE | |num_params_per_fn | 6| 79.0| | |loc_per_fn_r | 193| 99.1|TRUE | |loc_per_fn_r_exp | 132| 94.8| | |loc_per_fn_r_not_exp | 260| 99.5|TRUE | |rel_whitespace_R | 18| 55.4| | |rel_whitespace_vignettes | 29| 89.2| | |rel_whitespace_tests | 12| 68.9| | |doclines_per_fn_exp | 60| 72.8| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 0| 0.0|TRUE | ---

2a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package


3. goodpractice and other checks

Details of goodpractice checks (click to open)

#### 3a. Continuous Integration Badges [![R-CMD-check.yaml](https://github.com/AAoritz/nuts/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/AAoritz/nuts/actions) **GitHub Workflow Results** | id|name |conclusion |sha | run_number|date | |----------:|:----------------------------------------------|:----------|:------|----------:|:----------| | 7694130869|pages build and deployment with artifacts-next |success |38d719 | 13|2024-01-29 | | 7694097562|pkgdown |success |cf771c | 16|2024-01-29 | | 7694097559|R-CMD-check |success |cf771c | 10|2024-01-29 | | 7694097589|test-coverage |success |cf771c | 14|2024-01-29 | --- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) R CMD check generated the following check_fail: 1. no_import_package_as_a_whole #### Test coverage with [covr](https://covr.r-lib.org/) Package coverage: 94.97 #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) The following functions have cyclocomplexity >= 15: function | cyclocomplexity --- | --- convert_nuts_level | 24 convert_nuts_version | 21 classify_nuts | 19 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found the following 184 potential issues: message | number of times --- | --- Avoid library() and require() calls in packages | 22 Lines should not be more than 80 characters. | 116 Use <-, not =, for assignment. | 46


Package Versions

|package |version | |:--------|:--------| |pkgstats |0.1.3.9 | |pkgcheck |0.1.2.13 |


Editor-in-Chief Instructions:

This package is in top shape and may be passed on to a handling editor

jhollist commented 5 months ago

@AAoritz looking good! Assigning editor in just a few.

jhollist commented 5 months ago

@ropensci-review-bot assign @maelle as editor

ropensci-review-bot commented 5 months ago

Assigned! @maelle is now the editor

maelle commented 5 months ago

Editor checks:

Editor comments

Many thanks for your submission @AAoritz & @krausewe! Useful package! I know of the French equivalent to nuts, COGjugaison, so was vaguely aware of the idea https://antuki.github.io/COGugaison/ :smile_cat:

I have a few comments that I'd like to resolve or discuss before I proceed to the reviewer search.

Contributing guide

Docs

User interface

output <- list(
  data = data, 
  versions_data = data_all_versions,  
  missing_data = data_missing_nuts
)

so that then the user could use output [["missing_data"]] instead of output[[3]] which is less readable.

stop("Input 'nuts_code' must be provided as a string.")

could become

cli::cli_abort("Input {.arg nuts_code} must be provided as a string, not {.obj_type_friendly {nuts_code}}.")

(note that this function also requires importing rlang in DESCRIPTION)

and info messages could use cli_alert_ functions https://cli.r-lib.org/reference/cli_alert.html

Code

not_enough_codes <- (length(data$from_code[check_nuts_codes]) < length(data$from_code) &&
               length(data$from_code[check_nuts_codes]) > 0)
if  (not_enough_codes)

(oh and the use of && instead of & see https://lintr.r-lib.org/reference/vector_logic_linter.html)

Tests

manure_indic_DE_2003 <- function() {
manure %>%
  filter(nchar(geo) == 4) %>%
  filter(indic_ag == "I07A_EQ_Y") %>%
  select(-indic_ag) %>%
  filter(grepl("^DE", geo)) %>%
  filter(time == 2003) %>%
  select(-time)
}

and in tests/testthat/test-classify_nuts.R you'd have

test_that("Needs geo var 1", {
  expect_error(
    manure_indic_DE_2003() %>% classify_nuts(nuts_code = NULL),
    "Input 'nuts_code' cannot be NULL."
  )
})

test_that("Needs geo var 2", {
  expect_error(manure_indic_DE_2003() %>% classify_nuts())
})

test_that("nuts_code not valid", {
  expect_error(
    manure_indic_DE_2003() %>% classify_nuts(nuts_code = 1),
    "Input 'nuts_code' must be provided as a string."
  )
})

See https://blog.r-hub.io/2020/11/18/testthat-utility-belt/ and https://r-pkgs.org/testing-design.html + https://r-pkgs.org/testing-advanced.html

Happy to answer any question and to discuss! Thanks again for submitting your package!

AAoritz commented 5 months ago

Hi @maelle ! Many thanks for these super useful checks and comments. @krausewe and I start looking into them now and we will get back to you soon.

krausewe commented 4 months ago

Hi @maelle!

We have addressed the very helpful remarks & recommendations.

Please find below the list of commits.

We are looking forward to the next steps!

Contributing guide

Docs

User interface

Code

Tests

maelle commented 4 months ago

Awesome, thank you! I'll now look for reviewers.

Note that @mpadge's and my tech note is now published: https://ropensci.org/blog/2024/02/06/verbosity-control-packages/

maelle commented 4 months ago

@ropensci-review-bot seeking reviewers

ropensci-review-bot commented 4 months ago

Please add this badge to the README of your package repository:

[![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/623_status.svg)](https://github.com/ropensci/software-review/issues/623)

Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news

krausewe commented 4 months ago

@ropensci-review-bot.

Added Peer Review Status.

NEWS.md was already created and included.

maelle commented 4 months ago

@ropensci-review-bot add @nolwenn to reviewers

ropensci-review-bot commented 4 months ago

@nolwenn added to the reviewers list. Review due date is 2024-03-01. Thanks @nolwenn for accepting to review! Please refer to our reviewer guide.

rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.

ropensci-review-bot commented 4 months ago

@nolwenn: If you haven't done so, please fill this form for us to update our reviewers records.

AAoritz commented 4 months ago

Hi @maelle! We are a bit lost with implementing verbosity control with local_options. This example from your blog does not seem to work with cli:

> pkg_message <- function(...) {
+   is_verbose_mode <- (getOption("mypackage.verbose", "quiet") == "verbose")
+   if (is_verbose_mode) {
+     # Options local to this function only; reset on exit!
+     rlang::local_options(rlib_message_verbosity = "verbose")
+   }
+   rlang::inform(...)
+ }
> pkg_message("normal message")
normal message
> rlang::local_options(rlib_message_verbosity = "quiet")
Setting global deferred event(s).
i These will be run:
  * Automatically, when the R session ends.
  * On demand, if you call `withr::deferred_run()`.
i Use `withr::deferred_clear()` to clear them without executing.
> pkg_message("suppressed message")
> rlang::local_options(mypackage.verbose = "verbose")
> pkg_message("reawakened message")
reawakened message

> withr::deferred_run()
Ran 2/2 deferred expressions
> pkg_message <- function(...) {
+   is_verbose_mode <- (getOption("mypackage.verbose", "quiet") == "verbose")
+   if (is_verbose_mode) {
+     # Options local to this function only; reset on exit!
+     rlang::local_options(rlib_message_verbosity = "verbose")
+   }
+   cli::cli_h1(...)
+ }
> pkg_message("normal message")

── normal message ───────────────────────────────────────────────────────────────────────────────────────────────
> rlang::local_options(rlib_message_verbosity = "quiet")
Setting global deferred event(s).
i These will be run:
  * Automatically, when the R session ends.
  * On demand, if you call `withr::deferred_run()`.
i Use `withr::deferred_clear()` to clear them without executing.
> pkg_message("suppressed message")

── suppressed message ───────────────────────────────────────────────────────────────────────────────────────────
> rlang::local_options(mypackage.verbose = "verbose")
> pkg_message("reawakened message")

── reawakened message ───────────────────────────────────────────────────────────────────────────────────────────
> 
maelle commented 4 months ago

Right, it seems specific to rlang (cc @mpadge).

How about something simpler like

pkg_message <- function(...) {
  is_verbose_mode <- (getOption("mypackage.verbose", "quiet") == "verbose")
  if (is_verbose_mode) {
    cli::cli_alert(...)
  }

}

withr::with_options(list("mypackage.verbose" = "quiet"), {
  pkg_message("pof")
})

withr::with_options(list("mypackage.verbose" = "verbose"), {
  pkg_message("pof")
})
#> → pof

Created on 2024-02-13 with reprex v2.1.0

:thinking:

mpadge commented 4 months ago

@AAoritz The options we wrote about in the blog post only apply to the cli_inform(), cli_warn(), and cli_abort() functions, not to any other cli functions. If you want to use others such as cli_h<N>(), you'd need to further adapt the custom handler to not call that at all if verbose == "quiet".

AAoritz commented 4 months ago

Thank you @maelle and @mpadge! I wrapped our cli message as suggested in https://github.com/AAoritz/nuts/commit/47ee441ea6ea98b30326157b803f81dba5557b42 by @maelle. Here is the result:

rlang::local_options(nuts.verbose = "quiet")
df <- patents %>%
  filter(unit == "NR", nchar(geo) == 4, time == 2012) %>%
  filter(grepl("^DE", geo)) %>%
  classify_nuts(data = ., nuts_code = "geo")

withr::deferred_run()
#> Ran 1/1 deferred expressions
rlang::local_options(nuts.verbose = "verbose")
df <- patents %>%
  filter(unit == "NR", nchar(geo) == 4, time == 2012) %>%
  filter(grepl("^DE", geo)) %>%
  classify_nuts(data = ., nuts_code = "geo")
#> 
#> ── Classifying version of NUTS codes ───────────────────────────────────────────
#> Within groups defined by country:
#> ! These NUTS codes cannot be identified or classified: DEXX and DEZZ.
#> ✔ Unique NUTS version classified.
#> ✔ No missing NUTS codes.

withr::deferred_run()
#> Ran 2/2 deferred expressions
rlang::local_options(rlib_message_verbosity = "quiet")
df <- patents %>%
  filter(unit == "NR", nchar(geo) == 4, time == 2012) %>%
  filter(grepl("^DE", geo)) %>%
  classify_nuts(data = ., nuts_code = "geo")

Created on 2024-02-14 with reprex v2.0.2

How would you recommend to document this option? Below the examples in the function level help/documentation?

mpadge commented 4 months ago

@AAoritz Would you please be so kind as to ask that question on our discussion forum? The issue of where to document verbosity control is important, and would be good to have in that general context, rather than here.

maelle commented 4 months ago

@ropensci-review-bot add @jospueyo to reviewers

ropensci-review-bot commented 4 months ago

@jospueyo added to the reviewers list. Review due date is 2024-03-07. Thanks @jospueyo for accepting to review! Please refer to our reviewer guide.

rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.

ropensci-review-bot commented 4 months ago

@jospueyo: If you haven't done so, please fill this form for us to update our reviewers records.

jospueyo commented 4 months ago

Package Review

Documentation

The package includes all the following forms of documentation:

Functionality

Estimated hours spent reviewing: 3 h


Review Comments

The package is useful, very useful indeed, and also well designed. Most of my comments below are just suggestions that IMHO will make the package more intuitive, more robust, or more easily maintainable. Don't take them as mandatory. At the end, this is your package and you should have the last word.

Disclaimer: Apologies if some comments sound though or rude. English is not my native language.

General comments

Function naming does not follow rOpenSci package guidelines. Specifically, the design object_verb(). Please, start your exported functions with nuts_, this also facilitates autocompletion.

Variables in vignettes should use underscores and not points, that can clash with S3 methods.

group_vars argument could be changed to .by, to be consinstent with dplyr functions.

convert_nuts_levels might be changed to nuts_aggregate to be explicit about that the level change is only aggregation and not disaggregation.

Provide wrapper functions to inspect sub-elements of classifiy_nuts, something like:

missing_rm could also provide some imputation technique, such as using the mean. In the future, it might be linked to imputation packages such as mice, for instance.

There are some inconsistencies in your constructions that can make the package harder to mantain, especially if there is a maintainer change in the future. For instance, sometimes you are explicit inside your conditions: if(condition == FALSE) do something... vs if(!condition) do something.... You should check the constructions you used and try to be consistent. BTW, you should avoid using T and F, replace them with TRUE and FALSE.

You should not supply default values in mandatory arguments. Otherwise, if you run the function without providing these values, it raises a cryptographic error instead of the usual Error in classify_nuts() : argument "data" is missing, with no default.

When an argument accepts a set of values, such asmultiple_versions or weights, it’s preferable to pass a vector with all options as default to show all possible values. E.g. multiple_versions = c('break', 'most_frequent'). Then, you use the first element, which would be break by default. BTW, break is a concept that not anybody might be familiar with. Maybe, using error would be more intuitive.

When you check if an argument is one of the accepted values, you should contemplate that users could pass a vector of length > 1 (never underestimate end-users). If this happens, an error is given because if() can only accept 1-length vectors. You should do something like if(!all(nuts_code %in% colnames(data))).

The “nuts.verbose” options should be mentioned in the help of each function. Otherwise, users can’t know what options they have about verbosity.

Documentation

In the table about converting relative values, some * are interpreted by markdown as italics. This makes impossible to interpret the weighted average formulas. You should fix it, possibly converting to equations or to code. You will have to try.

Imports

All the scoped verbs (*_at) were superseeded in current versions of dplyr. Replace with new forms such as summarize(across()).

The pipe from magrittr and as_tibble from tibble package can be imported from dplyr and you can remove two packages from imports. You will depend on tibble and magrittr anyway, but not directly but trough dplyr.

Likewise, if you don’t want to depend on stringr, you can easily replace str_remove with gsub(), passing "" as replacement; and str_detect with grepl.

classify_nuts (suggested name: nuts_classify)

convert_nuts_version (suggested name: nuts_convert_version)

convert_nuts_level (suggested name: nuts_aggregate)

L151: This chunk of code is repeated in convert_nuts_version. This is complex enough to avoid repeating yourself. I suggest moving all this chunk to a new function and call this function in both. This makes you code more maintainable.

Looking forward your responses!

maelle commented 4 months ago

Thanks a lot @jospueyo! :pray:

Could you please add the number of hours you spent reviewing near "Estimated hours spent reviewing:" in the template? Thanks!

krausewe commented 4 months ago

Thanks so much @jospueyo! All of this is super helpful.

We will start addressing the recommendations asap!

maelle commented 4 months ago

@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/623#issuecomment-1951446662 time 3

ropensci-review-bot commented 4 months ago

Logged review for jospueyo (hours: 3)

nolwenn commented 4 months ago

nuts package review

Briefly describe any working relationship you have (had) with the package authors.

Documentation

The package includes all the following forms of documentation:

#### Functionality

Estimated hours spent reviewing: 2


Review Comments

The package could be really useful for many applications where spatial-temporal question is involved. At the time of my review, the authors have already considered some of the comments of reviewer 1 so you will excuse me if some of my comments might be outdated.

Installation instructions github

you have a error, an extra "/" . It should be r pak::pak("AAoritz/nuts") and not r pak::pak("AAoritz/nuts/")

At first load I had the message:

Error: package or namespace load failed for ‘nuts’ in loadNamespace(j \<- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]): namespace ‘vctrs’ 0.6.3 is already loaded, but >= 0.6.4 is required

I had to close and reopen R/Rstudio then it worked fine.

Vignette(s)

The Rmd file has been updated with the previous recommendations by @jospueyo concerning the naming of the function (which is good) but not the HTML version https://aaoritz.github.io/nuts/articles/nuts-vignette.html. The vignette in HTML version need to be rebuilt and push for online display.

In the vignette, the author did not explicitly call the libraries required for running the data management needed for the examples like the meta-library tidyverse with dplyr for the filter() functions and other data management functions.

In the vignette, maps are displayed with various NUTS over time. Even so it is not the goal of the nuts package, a note mentioning which package(s) (or software) was(were) used to create those maps could be of interest for the reader.

Functionality

The function nuts_test_multiple_versions do not have an examples. Would it be possible to have at least one?

After a nuts_convert_version(), it might be interesting to have a helping function that computes the estimated variations of the values between the different NUTS versions. It could be the difference (increase or decrease) in absolute or relative frequency at the higher NUT level. Like to get the 7.7% that you estimated in your first manure example (Spatial interpolation in a nutshell)

Hope it helps,

Best, Nolwenn

krausewe commented 4 months ago

Amazing, @nolwenn. Thank you for the feedback!

(And sorry for already starting to revise the package while you were still reviewing.)

maelle commented 4 months ago

Thanks a lot @nolwenn for your review! :pray:

maelle commented 4 months ago

@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/623#issuecomment-1961501137 time 2

ropensci-review-bot commented 4 months ago

Logged review for nolwenn (hours: 2)

maelle commented 4 months ago

@AAoritz @krausewe both reviews are in! :tada:

AAoritz commented 3 months ago

Review by @jospueyo

Many thanks @jospueyo for your time and effort! We highly appreciate your thorough and constructive feedback. The package greatly improved from your comments. To summarize our reply, we have no objections and adopted all comments, except for 3 points below that may require further discussion.

Adopted points

General comments

Documentation

Imports

classify_nuts (suggested name: nuts_classify)

  • L310, you should add the class “nuts.classified” instead of overwriting the class “list”: class(x) <- c("new_class", class(x). Otherwise, if the object is no longer a list and only your custom class, this could break some methods for lists and raise unexpected behavior if the output of classify_nuts is used by something else than feed your other functions: https://github.com/AAoritz/nuts/commit/7d7b2b3ee3bff70667cb625c5b969d389619d918

convert_nuts_version (suggested name: nuts_convert_version)

convert_nuts_level (suggested name: nuts_aggregate)

Discussion points

  • When you check if an argument is one of the accepted values, you should contemplate that users could pass a vector of length > 1 (never underestimate end-users). If this happens, an error is given because if() can only accept 1-length vectors. You should do something like if(!all(nuts_code %in% colnames(data))):

We implemented specific checks for this case when length(argument) > 1 for single length arguments. https://github.com/AAoritz/nuts/commit/f052aa4ffd95bc29b8e5a46c6c69fcb123341db9

  • group_vars argument could be changed to .by:

This is an interesting development in dplyr. It does circumvent the danger of forgetting to ungroup(). It is still marked as experimental though. Would it make sense to wait with this? https://github.com/tidyverse/dplyr/blob/HEAD/R/by.R

  • missing_rm could also provide some imputation technique, such as using the mean. In the future, it might be linked to imputation packages such as mice:

This is a very interesting suggestion. However, we tend to think of imputation as a separate process apart from conversion/aggregation. Currently, imputation with the mice package could be done with joining a subset of data(all_nuts_codes) to the user's dataset and imputing across the universe of NUTS regions. Moreover, imputation as with the method used by the mice package (Van Buuren, S. (2018). Flexible Imputation of Missing Data) comes with many different assumptions whose legitimacy are highly context dependent. We would currently be more comfortable to leave this task to the users. We do have some ideas to deal with missing observations. Please see our discussion with @nolwenn below.

Review by @nolwenn

Thank you for your great input @nolwenn! Those commments are really valuable for this package. We adopted all of your suggestions with the exception of two points that we would like to discuss further.

Adopted comments

Discussion points

  • Error: package or namespace load failed for ‘nuts’ in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]): namespace ‘vctrs’ 0.6.3 is already loaded, but >= 0.6.4 is required:

We currently don't see where this comes from. Does this error persist with the most recent version?

  • After a nuts_convert_version(), it might be interesting to have a helping function that computes the estimated variations of the values between the different NUTS versions. It could be the difference (increase or decrease) in absolute or relative frequency at the higher NUT level. Like to get the 7.7% that you estimated in your first manure example (Spatial interpolation in a nutshell)

This is an interesting proposal. If we understand your point correctly, we could include an option in nuts_convert_version() that could return e.g. a column in the converted dataset containing more details on the conversion. It could for instance be a string that pastes the path how a NUTS region was converted similar to our explanation of the methodology in the Vignette. For instance, in the case where a region ZZ72 is created from regions ZZ68 and ZZ70, the converted data set would have a column nuts_code equal to ZZ72 and a new variable conversion_path equal to 0.6 * ZZ68 + 0.2*ZZ70. Currently, this type of information could be extracted, by sub-setting and transforming the tibble data(cross_walks).

Along similar lines, we have thought about an option that may report/or make use of the share of conversion/aggregation weights missing due to missing regions. For instance, when region ZZ72 is created mainly from ZZ68 and only marginally from ZZ70, where the latter is missing, e.g. ZZ72 = 0.8 * ZZ68 + 0.01 * NA, setting ZZ70 to 0 by using missing_rm = T would be less dramatic as when the weight placed on ZZ70 was larger. Computing the share of weights missing (in this example 0.01 / 0.81), we could change the missing_rm option to a threshold at which the users feel confident to assume 0 for the missing regions (e.g. missing_rm > 0.02).

#

Many thanks! @krausewe and I are looking forward to both of your replies!

maelle commented 3 months ago

@AAoritz @krausewe thank you! Could you please record your response for the bot? https://devdevguide.netlify.app/bot_cheatsheet#submit-response-to-reviewers

@jospueyo @nolwenn could you please read the response of the authors and respond? Once you are happy with the changes, please use the approval template https://devdevguide.netlify.app/approval2template

jospueyo commented 3 months ago

Reviewer Response

This has been awesome. I enjoyed reviewing your package and monitoring its evolution. I hope you are truly happy with my suggestions, I am with your responses.

Regarding .by argument, I didn't know it was still experimental. Otherwise, I wouldn't recommend the change.

I also agree with your opinion about imputation of missing values.

Final approval (post-review)

Estimated hours spent reviewing: 3.5

AAoritz commented 3 months ago

@ropensci-review-bot submit response https://github.com/ropensci/software-review/issues/623#issuecomment-1976920666

ropensci-review-bot commented 3 months ago

Logged author response!

ropensci-review-bot commented 3 months ago

@AAoritz, @krausewe: please post your response with @ropensci-review-bot submit response <url to issue comment> if you haven't done so already (this is an automatic reminder).

Here's the author guide for response. https://devguide.ropensci.org/authors-guide.html