Closed RWParsons closed 1 year ago
Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help
for help.
:rocket:
Editor check started
:wave:
Oops, something went wrong with our automatic package checks. Our developers have been notified and package checks will appear here as soon as we've resolved the issue. Sorry for any inconvenience.
@ropensci-review-bot help
Hello @RWParsons, here are the things you can ask me to do:
# Add an author's response info to the ROpenSci logs
@ropensci-review-bot submit response <AUTHOR_RESPONSE_URL>
# List all available commands
@ropensci-review-bot help
# Show our Code of Conduct
@ropensci-review-bot code of conduct
# Invite the author of a package to the corresponding rOpenSci team. This command should be issued by the author of the package.
@ropensci-review-bot invite me to ropensci/package-name
# Adds package's repo to the rOpenSci team. This command should be issued after approval and transfer of the package.
@ropensci-review-bot finalize transfer of package-name
# Various package checks
@ropensci-review-bot check package
@ropensci-review-bot check package
Thanks, about to send the query.
:rocket:
Editor check started
:wave:
git hash: 0aecd6f6
(Checks marked with :eyes: may be optionally addressed.)
Package License: GPL (>= 3)
srr
package)This package is in the following category:
:heavy_check_mark: All applicable standards [v0.2.0] have been documented in this package (69 complied with; 33 N/A standards)
Click to see the report of author-reported standards compliance of the package with links to associated lines of code, which can be re-generated locally by running the srr_report()
function from within a local clone of the repository.
The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.
|type |package | ncalls|
|:----------|:----------|------:|
|internal |base | 152|
|internal |predictNMB | 28|
|internal |utils | 4|
|imports |magrittr | 45|
|imports |ggplot2 | 17|
|imports |stats | 16|
|imports |dplyr | 14|
|imports |cutpointr | 10|
|imports |tibble | 4|
|imports |tidyr | 4|
|imports |rlang | 2|
|imports |pmsampsize | 1|
|imports |assertthat | NA|
|suggests |parallel | 1|
|suggests |covr | NA|
|suggests |flextable | NA|
|suggests |knitr | NA|
|suggests |rmarkdown | NA|
|suggests |testthat | NA|
|suggests |vdiffr | NA|
|suggests |withr | NA|
|linking_to |NA | NA|
Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(
list (31), c (26), lapply (10), do.call (7), t (5), unlist (5), by (4), col (4), data.frame (4), mean (4), abs (3), for (3), length (3), names (3), return (3), seq_len (3), switch (3), cbind (2), class (2), nrow (2), rep (2), sample (2), sqrt (2), all.equal (1), as.character (1), as.factor (1), colMeans (1), diff (1), environment (1), isTRUE (1), log (1), ls (1), mapply (1), max (1), min (1), replicate (1), seq_along (1), sort (1), sum (1), suppressWarnings (1), unique (1), vapply (1)
%>% (45)
get_inbuilt_cutpoint (4), get_plot_data (3), cost_threshold (2), do_sample_size_calc (2), f_iteration_wrapper (2), get_row_from_sim (2), get_sample (2), total_nmb (2), add_interval (1), add_sample_size_calcs (1), approx_match (1), do_nmb_iteration (1), do_nmb_sim (1), evaluate_cutpoint (1), fill_fx_names (1), get_nmb_sampler (1), get_thresholds (1)
aes (7), ggplot (2), label_wrap_gen (2), geom_histogram (1), geom_linerange (1), geom_segment (1), ggplot_build (1), position_dodge (1), scale_fill_manual (1)
rnorm (4), binomial (2), family (2), glm (2), median (2), pt (2), predict (1), var (1)
mutate (4), left_join (3), row_number (2), summarize (2), across (1), inner_join (1), n (1)
cutpointr (5), maximize_metric (2), auc (1), sensitivity (1), specificity (1)
as_tibble (2), rownames_to_column (1), tibble (1)
expand_grid (2), pivot_longer (2)
data (4)
sym (2)
clusterExport (1)
pmsampsize (1)
base
magrittr
predictNMB
ggplot2
stats
dplyr
cutpointr
tibble
tidyr
utils
rlang
parallel
pmsampsize
This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.
The package has: - code in R (100% in 12 files) and - 3 authors - 4 vignettes - no internal data file - 10 imported packages - 14 exported functions (median 27 lines of code) - 44 non-exported functions in R (median 22 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by [the `checks_to_markdown()` function](https://docs.ropensci.org/pkgcheck/reference/checks_to_markdown.html) The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|-----:|----------:|:----------| |files_R | 12| 65.5| | |files_vignettes | 5| 96.9| | |files_tests | 8| 88.2| | |loc_R | 1266| 74.2| | |loc_vignettes | 829| 88.0| | |loc_tests | 563| 77.1| | |num_vignettes | 4| 96.6|TRUE | |n_fns_r | 58| 61.1| | |n_fns_r_exported | 14| 56.3| | |n_fns_r_not_exported | 44| 64.1| | |n_fns_per_file_r | 3| 55.5| | |num_params_per_fn | 6| 84.1| | |loc_per_fn_r | 24| 69.2| | |loc_per_fn_r_exp | 27| 58.8| | |loc_per_fn_r_not_exp | 22| 68.3| | |rel_whitespace_R | 14| 68.3| | |rel_whitespace_vignettes | 30| 88.6| | |rel_whitespace_tests | 21| 75.9| | |doclines_per_fn_exp | 56| 69.3| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 32| 57.0| | ---
Click to see the interactive network visualisation of calls between objects in package
goodpractice
and other checks#### 3a. Continuous Integration Badges [![R-CMD-check](https://github.com/RWParsons/predictNMB/workflows/R-CMD-check/badge.svg)](https://github.com/rwparsons/predictNMB/actions) [![pkgcheck](https://github.com/RWParsons/predictNMB/workflows/pkgcheck/badge.svg)](https://github.com/rwparsons/predictNMB/actions) **GitHub Workflow Results** | id|name |conclusion |sha | run_number|date | |----------:|:--------------------------|:----------|:------|----------:|:----------| | 3746509530|pages build and deployment |success |5f5226 | 51|2022-12-21 | | 3746435406|pkgcheck |success |0aecd6 | 3|2022-12-21 | | 3746435412|pkgdown |success |0aecd6 | 54|2022-12-21 | | 3746435410|R-CMD-check |success |0aecd6 | 117|2022-12-21 | | 3746435408|test-coverage |success |0aecd6 | 16|2022-12-21 | --- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) rcmdcheck found no errors, warnings, or notes #### Test coverage with [covr](https://covr.r-lib.org/) Package coverage: 91.41 #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) The following functions have cyclocomplexity >= 15: function | cyclocomplexity --- | --- get_inbuilt_cutpoint | 25 plot.predictNMBscreen | 20 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found the following 90 potential issues: message | number of times --- | --- Avoid library() and require() calls in packages | 11 Lines should not be more than 80 characters. | 79
:heavy_multiplication_x: The following 3 function names are duplicated in other packages: - - `get_sample` from funModeling, openeo, triplot - - `get_thresholds` from brainGraph, healthcareai - - `make_summary_table` from dglars
|package |version | |:--------|:---------| |pkgstats |0.1.3 | |pkgcheck |0.1.0.32 | |srr |0.0.1.186 |
This package is in top shape and may be passed on to a handling editor
Hello again @RWParsons ,
Just a note that most editors are on a holiday break. We will pick up assigning a handling editor as soon as everyone is back in early January. Have a nice break! ✨
No worries at all - thanks for the update @annakrystalli 👌!
@annakrystalli I just snuck in a quick change to the package to add progress bars which I had been meaning to do before submission but forgotted about until yesterday - I've updated the description in the OP of this issue and will re-run the bot to check the package. Hope this is all fine - thanks!
@ropensci-review-bot check package
Thanks, about to send the query.
:rocket:
Editor check started
:wave:
git hash: 2fef46dd
(Checks marked with :eyes: may be optionally addressed.)
Package License: GPL (>= 3)
srr
package)This package is in the following category:
:heavy_check_mark: All applicable standards [v0.2.0] have been documented in this package (69 complied with; 33 N/A standards)
Click to see the report of author-reported standards compliance of the package with links to associated lines of code, which can be re-generated locally by running the srr_report()
function from within a local clone of the repository.
The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.
|type |package | ncalls|
|:----------|:----------|------:|
|internal |base | 152|
|internal |predictNMB | 28|
|internal |utils | 4|
|imports |magrittr | 45|
|imports |ggplot2 | 17|
|imports |stats | 16|
|imports |dplyr | 14|
|imports |cutpointr | 10|
|imports |tibble | 4|
|imports |tidyr | 4|
|imports |rlang | 2|
|imports |pmsampsize | 1|
|imports |assertthat | NA|
|suggests |parallel | 1|
|suggests |pbapply | 1|
|suggests |covr | NA|
|suggests |flextable | NA|
|suggests |knitr | NA|
|suggests |rmarkdown | NA|
|suggests |testthat | NA|
|suggests |vdiffr | NA|
|suggests |withr | NA|
|linking_to |NA | NA|
Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(
list (31), c (26), lapply (9), do.call (7), t (5), unlist (5), by (4), col (4), data.frame (4), mean (4), seq_len (4), abs (3), for (3), length (3), names (3), return (3), switch (3), cbind (2), class (2), nrow (2), rep (2), sample (2), sqrt (2), all.equal (1), as.character (1), as.factor (1), colMeans (1), diff (1), environment (1), isTRUE (1), log (1), ls (1), mapply (1), max (1), min (1), replicate (1), seq_along (1), sort (1), sum (1), suppressWarnings (1), unique (1), vapply (1)
%>% (45)
get_inbuilt_cutpoint (4), get_plot_data (3), cost_threshold (2), do_sample_size_calc (2), f_iteration_wrapper (2), get_row_from_sim (2), get_sample (2), total_nmb (2), add_interval (1), add_sample_size_calcs (1), approx_match (1), do_nmb_iteration (1), do_nmb_sim (1), evaluate_cutpoint (1), fill_fx_names (1), get_nmb_sampler (1), get_thresholds (1)
aes (7), ggplot (2), label_wrap_gen (2), geom_histogram (1), geom_linerange (1), geom_segment (1), ggplot_build (1), position_dodge (1), scale_fill_manual (1)
rnorm (4), binomial (2), family (2), glm (2), median (2), pt (2), predict (1), var (1)
mutate (4), left_join (3), row_number (2), summarize (2), across (1), inner_join (1), n (1)
cutpointr (5), maximize_metric (2), auc (1), sensitivity (1), specificity (1)
as_tibble (2), rownames_to_column (1), tibble (1)
expand_grid (2), pivot_longer (2)
data (4)
sym (2)
clusterExport (1)
pblapply (1)
pmsampsize (1)
base
magrittr
predictNMB
ggplot2
stats
dplyr
cutpointr
tibble
tidyr
utils
rlang
parallel
pbapply
pmsampsize
This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.
The package has: - code in R (100% in 12 files) and - 3 authors - 4 vignettes - no internal data file - 10 imported packages - 14 exported functions (median 27 lines of code) - 44 non-exported functions in R (median 22 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by [the `checks_to_markdown()` function](https://docs.ropensci.org/pkgcheck/reference/checks_to_markdown.html) The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|-----:|----------:|:----------| |files_R | 12| 65.5| | |files_vignettes | 5| 96.9| | |files_tests | 8| 88.2| | |loc_R | 1304| 74.8| | |loc_vignettes | 831| 88.0| | |loc_tests | 591| 78.1| | |num_vignettes | 4| 96.6|TRUE | |n_fns_r | 58| 61.1| | |n_fns_r_exported | 14| 56.3| | |n_fns_r_not_exported | 44| 64.1| | |n_fns_per_file_r | 3| 55.5| | |num_params_per_fn | 6| 84.1| | |loc_per_fn_r | 24| 69.2| | |loc_per_fn_r_exp | 27| 58.8| | |loc_per_fn_r_not_exp | 22| 68.3| | |rel_whitespace_R | 14| 68.5| | |rel_whitespace_vignettes | 30| 88.6| | |rel_whitespace_tests | 22| 77.7| | |doclines_per_fn_exp | 56| 69.3| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 32| 57.0| | ---
Click to see the interactive network visualisation of calls between objects in package
goodpractice
and other checks#### 3a. Continuous Integration Badges [![R-CMD-check](https://github.com/RWParsons/predictNMB/workflows/R-CMD-check/badge.svg)](https://github.com/rwparsons/predictNMB/actions) [![pkgcheck](https://github.com/RWParsons/predictNMB/workflows/pkgcheck/badge.svg)](https://github.com/rwparsons/predictNMB/actions) **GitHub Workflow Results** | id|name |conclusion |sha | run_number|date | |----------:|:--------------------------|:----------|:------|----------:|:----------| | 3835432164|pages build and deployment |success |7bf6a7 | 53|2023-01-04 | | 3835300178|pkgcheck |success |2fef46 | 5|2023-01-04 | | 3835300175|pkgdown |success |2fef46 | 56|2023-01-04 | | 3835300177|R-CMD-check |success |2fef46 | 119|2023-01-04 | | 3835300174|test-coverage |success |2fef46 | 18|2023-01-04 | --- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) rcmdcheck found no errors, warnings, or notes #### Test coverage with [covr](https://covr.r-lib.org/) Package coverage: 90.64 #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) The following functions have cyclocomplexity >= 15: function | cyclocomplexity --- | --- get_inbuilt_cutpoint | 25 plot.predictNMBscreen | 20 screen_simulation_inputs | 16 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found the following 90 potential issues: message | number of times --- | --- Avoid library() and require() calls in packages | 11 Lines should not be more than 80 characters. | 79
:heavy_multiplication_x: The following 3 function names are duplicated in other packages: - - `get_sample` from funModeling, openeo, triplot - - `get_thresholds` from brainGraph, healthcareai - - `make_summary_table` from dglars
|package |version | |:--------|:---------| |pkgstats |0.1.3 | |pkgcheck |0.1.0.32 | |srr |0.0.1.186 |
This package is in top shape and may be passed on to a handling editor
@ropensci-review-bot check srr
:heavy_check_mark: This package complies with > 50% of all standads and may be submitted.
Thank you @RWParsons for the updates.
We are now looking for a handling editor 👍
Dear @RWParsons,
Today starts my rotation as EiC so the role of @annakrystalli is now mine. I've watched the conversations here and among editors and I see we're still looking for a handling editor. Thanks for your patience.
Hi @maurolepore,
No worries - please let me know if I can to do anything to help progress this.
@ropensci-review-bot assign @adamhsparks as editor
Assigned! @adamhsparks is now the editor
Hi @RWParsons, I'll be your handling editor for this submission. This is my first stats package submission, so please bear with me, but I think we'll be alright. ☺️
I've performed the editor checks on your package and have a few items that should be addressed before the reviewers start their work.
lazydata: true
field in the DESCRIPTION file, there is no data in the packageOmitted ‘LazyData’ from DESCRIPTION
NB: this package now depends on R (>= 3.5.0) WARNING: Added dependency on R >= 3.5.0 because serialized objects in serialize/load version 3 cannot be read in older versions of R. File(s) containing such objects: ‘predictNMB/tests/testthat/fixtures/predictNMBscreen_object.rds’ ‘predictNMB/tests/testthat/fixtures/predictNMBsim_object.rds’
goodpractice::gp()
and address as necessary and possible.spellcheck::spell_check()
returns several hits.
Please set up spell-checking and specify the language in the DESCRIPTION file.
You can do this with the {spelling} package.
As a fellow Aussie, I recognise many of these are not misspelled, but {spelling} will allow you to build a wordlist to accept them and not flag in future checks.
DESCRIPTION does not contain 'Language' field. Defaulting to 'en-US'.
WORD FOUND IN
AUD detailed-example.Rmd:36,37
behaviour summarising-results-with-predictNMB.Rmd:250
categorised introduction-to-predictNMB.Rmd:63
centre introduction-to-predictNMB.Rmd:117
CMD README.md:8
README.Rmd:28
Codecov README.md:12
README.Rmd:30
colour plot.predictNMBsim.Rd:49,50
Colour plot.predictNMBsim.Rd:59
coloured plot.predictNMBscreen.Rd:60
plot.predictNMBsim.Rd:37
colours plot.predictNMBsim.Rd:48
summarising-results-with-predictNMB.Rmd:221
FN do_nmb_sim.Rd:57
creating-nmb-functions.Rmd:24,67
introduction-to-predictNMB.Rmd:40,42,63
FNs creating-nmb-functions.Rmd:121
FP do_nmb_sim.Rd:57
creating-nmb-functions.Rmd:23,65
introduction-to-predictNMB.Rmd:39,42,63
frac creating-nmb-functions.Rmd:64
grey summarising-results-with-predictNMB.Rmd:221
inb make_summary_table.predictNMBscreen.Rd:26,54
make_summary_table.predictNMBsim.Rd:25,50
make_summary_table.Rd:26,55
plot.predictNMBscreen.Rd:38
plot.predictNMBsim.Rd:29
INB introduction-to-predictNMB.Rmd:133
summarising-results-with-predictNMB.Rmd:110
magrittr pipe.Rd:10,12
maximise creating-nmb-functions.Rmd:32
detailed-example.Rmd:32
introduction-to-predictNMB.Rmd:25
maximises do_nmb_sim.Rd:41
introduction-to-predictNMB.Rmd:160
maximising introduction-to-predictNMB.Rmd:194
minimisation introduction-to-predictNMB.Rmd:123
minimises do_nmb_sim.Rd:44
minimising do_nmb_sim.Rd:44
screen_simulation_inputs.Rd:53
detailed-example.Rmd:95
introduction-to-predictNMB.Rmd:143
modelling creating-nmb-functions.Rmd:35,127,285
nmb make_summary_table.predictNMBscreen.Rd:26,27,53
make_summary_table.predictNMBsim.Rd:25,26,49
make_summary_table.Rd:26,27,54
plot.predictNMBscreen.Rd:38,39
plot.predictNMBsim.Rd:29,30
NMB do_nmb_sim.Rd:41,56,62,66,98
evaluate_cutpoint.Rd:5,16,19,23
get_inbuilt_cutpoint.Rd:20
get_nmb_sampler.Rd:5,59
get_thresholds.Rd:5,14,24
make_summary_table.predictNMBscreen.Rd:53,54
make_summary_table.predictNMBsim.Rd:49,50
make_summary_table.Rd:54,55
screen_simulation_inputs.Rd:42,46
description:2
README.md:21,30,48,73,128,171
README.Rmd:37,41,55,75,126,146
creating-nmb-functions.Rmd:2,17,19,32,39,60,121,123,150,187,207,208,231,245
detailed-example.Rmd:47,68,95,99
introduction-to-predictNMB.Rmd:23,25,35,61,65,83,104,117,145,160
summarising-results-with-predictNMB.Rmd:85,88,99,110
optimisation introduction-to-predictNMB.Rmd:123
optimising do_nmb_sim.Rd:41
screen_simulation_inputs.Rd:53
detailed-example.Rmd:95,123,178
introduction-to-predictNMB.Rmd:143
summarising-results-with-predictNMB.Rmd:221
ORCID predictNMB-package.Rd:23,27,28,33,34
performant README.md:21
README.Rmd:37
pkgcheck README.md:16
README.Rmd:32
predictNMBscreen print.predictNMBscreen.Rd:5,18
predictNMBsim print.predictNMBsim.Rd:5,18
PSA creating-nmb-functions.Rmd:69
QALY creating-nmb-functions.Rmd:212
QALYs get_nmb_sampler.Rd:28
creating-nmb-functions.Rmd:204,210,212
detailed-example.Rmd:32
randomised detailed-example.Rmd:28
roc do_nmb_sim.Rd:48
stabilised summarising-results-with-predictNMB.Rmd:221
summarise make_summary_table.predictNMBscreen.Rd:26
make_summary_table.predictNMBsim.Rd:25
make_summary_table.Rd:26
plot.predictNMBscreen.Rd:38
plot.predictNMBsim.Rd:29
summarising-results-with-predictNMB.Rmd:26
summarising plot.predictNMBscreen.Rd:36
introduction-to-predictNMB.Rmd:125
Summarising README.md:167
README.Rmd:144
summarising-results-with-predictNMB.Rmd:2
TP do_nmb_sim.Rd:57
creating-nmb-functions.Rmd:21,64
introduction-to-predictNMB.Rmd:37,42
unintuitive creating-nmb-functions.Rmd:44
visualise plot.predictNMBscreen.Rd:82
plot.predictNMBsim.Rd:79
README.md:95
README.Rmd:104
introduction-to-predictNMB.Rmd:117,133
summarising-results-with-predictNMB.Rmd:26,85,88,99,110
visualised introduction-to-predictNMB.Rmd:117
visualising README.md:111
README.Rmd:112
Whitty detailed-example.Rmd:180
WTP creating-nmb-functions.Rmd:204,212
detailed-example.Rmd:32
youden do_nmb_sim.Rd:42
detailed-example.Rmd:123
Youden do_nmb_sim.Rd:42
creating-nmb-functions.Rmd:231
detailed-example.Rmd:95,97,144,178
introduction-to-predictNMB.Rmd:160,194
In the README, there is an incomplete thought on line 79, "cl: (Optional) users can pass a cluster as the. If it is passed, the simulations are run in parallel (faster).", emphasis mine.
Admittedly, this isn't my area of research, but I think that prospective users would benefit from a description of what the fields are in fx_nmb()
and that get_nmb_sampler()
returns a function, not an object as this is unusual behaviour in R.
The introductory vignette does cover this, so maybe just a pointer to that section of the document is sufficient for the uninitiated (like me).
There is no "CONTRIBUTING.md", please add one.
And lastly, RWParsons, could you please add the rOpenSci under review badge to your README?
[![](https://badges.ropensci.org/566_status.svg)](https://github.com/ropensci/software-review/issues/566)
I'm looking for reviewers now.
Reviewers: Due date:
Hi @adamhsparks,
This is my first ropensci submission so I'm bound to make mistakes here too!
Thanks for the review - I'll be working on them this week.
I have a CONTRIBUTING.md
file here (https://github.com/RWParsons/predictNMB/blob/master/.github/CONTRIBUTING.md) Is this in the wrong place?
Thanks!
{spelling} will let you update the wordlist to include any words you know are correct. In the past I have used "EN-AU" but it's occasionally not worked properly for me, so I just update the wordlist and leave the spelling as "EN-US" and use {spelling} to ensure I didn't introduce any new spelling errors after confirming what I wanted for Australian spelling, names, etc.
I missed the CONTRIBUTING.md, sorry. It is in the right spot. Just me getting ahead of myself. :)
@ropensci-review-bot seeking reviewers
Please add this badge to the README of your package repository:
[![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/566_status.svg)](https://github.com/ropensci/software-review/issues/566)
Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news
@ropensci-review-bot add @Tinula-kariyawasam to reviewers
@Tinula-kariyawasam added to the reviewers list. Review due date is 2023-02-27. Thanks @Tinula-kariyawasam for accepting to review! Please refer to our reviewer guide.
rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.
@Tinula-kariyawasam: If you haven't done so, please fill this form for us to update our reviewers records.
@ropensci-review-bot add @emitanaka to reviewers
@emitanaka added to the reviewers list. Review due date is 2023-02-28. Thanks @emitanaka for accepting to review! Please refer to our reviewer guide.
rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.
@emitanaka: If you haven't done so, please fill this form for us to update our reviewers records.
:calendar: @Tinula-kariyawasam you have 2 days left before the due date for your review (2023-02-27).
:calendar: @emitanaka you have 2 days left before the due date for your review (2023-02-28).
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
Briefly describe any working relationship you have (had) with the package authors.
[x] As the reviewer I confirm that there are no [conflicts of interest](https://devguide.ropensci.org/policies.html#coi) for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).Estimated hours spent reviewing: 4.5
Hi @RWParsons and other authors,
This is a neat package that I imagine will be helpful in decision making for clinical scenarios! The following are points to address some of the missing components from the rOpenSci guideline and some suggestions to hopefully make the package better. Some small typos/grammar fixes were put in as a pull request.
Contribution guideline appears to be missing (although I note that ropensci-review-bot
seems to indicate it has found it). All other points under "Community guidelines" are satisfied.
I recommend calling plot
as autoplot
instead. The general convention for plot
is for base plots and autoplot
for ggplot outputs. Or alternative suggestion is to make it a synonym: autoplot.predictNMBsim <- plot.predictNMBsim
.
I suggest remove extra_theme
argument in the plot
and get people to append this to the result of plot()
.
I find the argument return_all_methods
in get_inbuilt_cutpoint()
a bit awkward and isn't consistent with the main goal of the function (i.e. return a cutpoint). Would it be better to create another function that returns all inbuilt methods? E.g. get_inbuilt_cutpoint_methods()
that return a vector of method names.
I suggest the ci
argument to either be conf.level
just like in t.test()
or use more expressive argument
For make_summary_table()
, I wonder if you want to leverage the S3 generic summary()
?
For make_summary_table()
, I kind of think you can combine the arguments what
and rename_vector
.
Some performance checks for ones I know the correct value:
get_sample()
- looks good# should be roughly 0.7
yardstick::roc_auc(get_sample(0.7, 10000, 0.1), factor(1 - actual), x)
# should be roughly 0.9 and 0.1
table(get_sample(0.7, 10000, 0.1)$actual)/10000
Other functions, I can't think of straightforward tests but I don't see anything particularly wrong in the code.
Every test and examples seem to be working fine.
I am new to reviewing software so apologies if something in this review is not quite right or missing content I should have included.
Thanks, @emitanaka, for your thoughtful and thorough review.
There is a CONTRIBUTING.md file, it's in the .github directory. I had the same comment. I'm unsure why I missed it or what I thought I was looking for, interesting that you had the same comment though. Also, it does have the proper URL
and BugReports
fields in DESCRIPTION.
@Tinula-kariyawasam, are you able to submit your review soon or do you require some more time?
Thanks, @emitanaka for the thorough review and useful feedback - it's definitely improved the quality of the package and taught me a bit more about best practice!
I have implemented all your suggested changes except for combining the what
and rename_vector
for the make_summary_table()
(now changed to summary()
). For this method (and same for the plot()
- now autoplot()
- method), the what
pertains to the value being presented (selected cutpoint/NMB/INB) whereas the rename_vector
corresponds to the cutpoint selection method. Maybe I'm misunderstanding your point and there's a way to combine these that I'm not thinking of though!
On the extra_theme
comment, I was adding this to the graphic but also ... + theme_bw()
. I have made the change to move the extra_theme
content to theme_sim()
and shown how to add this in the documentation and all examples in the README/vignettes. Do you think that I should also avoid applying theme_bw()
to the graphics being returned (or at least add an argument for controlling this behaviour)?
I do have a performance test for get_sample()
but it was within a poorly named test file. I have renamed the test file to be test-get_sample.R
and this assesses the AUC of the resulting model compared to the specified value when calling get_sample()
.
Thanks, @emitanaka, for your thoughtful and thorough review.
There is a CONTRIBUTING.md file, it's in the .github directory. I had the same comment. I'm unsure why I missed it or what I thought I was looking for, interesting that you had the same comment though. Also, it does have the proper
URL
andBugReports
fields in DESCRIPTION.@Tinula-kariyawasam, are you able to submit your review soon or do you require some more time?
Hi @adamhsparks, just require a bit more time to finish my review. I shall have my review submitted by the 24th March. Thanks for the understanding
Thanks, @Tinula-kariyawasam
Hi @RWParsons , here is my review:
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).Estimated hours spent reviewing: 12
Congratulations on this great package. It is a really well-thought-out project, and no barring faults or issues are evident at all.
It works really well, is simple and easy to understand.
Thoroughly checks through all edge cases.
Two areas of improvement would entirely satisfy me on this package:
The use of more examples and use cases.
Efficiency and optimisation of the code, to elaborate;
Try to use vectorisation in your functions as much as possible. For example, instead of using a loop to generate simulated data for df_nmb_training and df_nmb_evaluation, you could use replicate() and do.call() to generate the data in a more vectorised manner. There are many occurrences of the 'lapply' function being utilised. A specific example is from the screen_simulation_inputs where the function used in the simulations loop can be replaced with a vectorised operations (e.g purrr:map(seq_len(nrow(input_grid)), \~ {...})))
Parallelisation: The lapply() function can be parallelised using the furrr package. By default, lapply() runs sequentially, but furrr allows us to run the loop in parallel on multiple cores, which can significantly speed up the computation time. We can parallelise the loop using future_map() or future_pmap(). In the screen_simulation_inputs function, for example future_map(input_grid, \~ do_nmb_sim(...)) will run the function do_nmb_sim() on each row of input_grid in parallel.
Avoiding unnecessary computations: In several places in the code, unnecessary computations are performed. For example, in the screen_meta section at the end of screen simulations function, the length of each input parameter is checked to determine if it varies across the screen. Instead of doing this check, we can store a list of the varying parameters at the beginning of the function and use it later to construct the screen_meta list. This will avoid unnecessary computations and improve the speed of the function. Another example is from the do_nmb_sim, the df_nmb_training and df_nmb_evaluation data frames are generated for each simulation, even though they are the same for each simulation. You could generate these data frames once outside the main loop and reuse them for each simulation.
Use of data.table instead of dplyr: If we are dealing with small data sets (in the ranges of hundreds or thousands) then the literature shows that the difference between using data.table and dplyr is minuscule, and it is really a preference of the user. However, to further increase a package's computational speed, I recommend taking advantage of data.table's efficiency. Another note where we can utilise data.table is where there are a few places in the code where the tidyr::expand_grid() function is used to create a data frame with all possible combinations of input values. This can be slow for large datasets. Instead, we can use the CJ() function from the data.table package to create a Cartesian join of the input values. For example, CJ(sample_size, sim_auc, event_rate, min_events, meet_min_events) will create a data frame with all possible combinations of sample_size, sim_auc, event_rate, min_events, and meet_min_events.
The if (inherits()) statements can be replaced with if (is.function()) to check if the arguments are functions. This is because inherits() checks if an object inherits from a class, which can be slower than checking if an object is a function.
Lastly, Using appropriate data structures can also improve the efficiency of the code. For example, instead of using a data frame to store the results for each simulation, you could use a matrix or an array to reduce memory usage and improve performance.
Many of example are on the do_nmb_sim and screen_simulation_inputs functions are they your two main ones.
I have modified and rewritten your code on my own personal device that aligns with the statements I made above then by using the microbenchmark function from the microbenchmark library checked if the run times of the base function or modified had a quicker evaluation based on 1000 runs, at each test the modified was quicker in terms of nanoseconds.
Overall there are no significant changes that I see need to be made to the package, and all tests and examples are working well. Just from a perspective of how we can improve the computational efficiency of this package, some changes need to be made.
@Tinula-kariyawasam, thank you for that detailed review!
@RWParsons, please let me know what assistance you may need.
Thanks for the review and helpful suggestions, @Tinula-kariyawasam !
Where would you like more use-cases and examples? Do you mean in addition to the vignettes or could any particular areas of the vignettes be expanded upon more?
I agree that there's definitely some areas where computational speed could be improved and would be happy to improve things (almost) everywhere you suggest. Could you please share with me the edited version of the package that you have on your device?
Hi @Tinula-kariyawasam, I've just been testing out some of your suggestions and working on some other areas where computation efficiency could be improved now that you've brought my attention there.
Try to use vectorisation in your functions as much as possible. For example, instead of using a loop to generate simulated data for df_nmb_training and df_nmb_evaluation, you could use replicate() and do.call() to generate the data in a more vectorised manner. There are many occurrences of the 'lapply' function being utilised. A specific example is from the screen_simulation_inputs where the function used in the simulations loop can be replaced with a vectorised operations (e.g purrr:map(seq_len(nrow(input_grid)), ~ {...})))
I know there are a lot of lapply()
's about in the code. I just tested swapping the one that's used most heavily (lapply(seq_len(n_sims), f_iteration_wrapper)
from do_nmb_sim.R) and had almost no performance gain when swapping it for replicate()
, but did introduce the problem of not being able to pass the iteration number so that the df_nmb_training is indexed appropriately. I think the use of lapply()
in screen_simulation_inputs()
are pretty inconsequential since they are only used for manipulating the grid of inputs to do_nmb_sim()
which is the function that does all the heavy lifting.
Parallelisation: The lapply() function can be parallelised using the furrr package. By default, lapply() runs sequentially, but furrr allows us to run the loop in parallel on multiple cores, which can significantly speed up the computation time. We can parallelise the loop using future_map() or future_pmap(). In the screen_simulation_inputs function, for example future_map(input_grid, ~ do_nmb_sim(...)) will run the function do_nmb_sim() on each row of input_grid in parallel.
I agree that parallelisation is beneficial here! I had initially done almost exactly as you had suggested - mapping the input_grid
to do_nmb_sim()
in parallel. However, I found that this was problematic when the input grid was small and the number of workers available much larger (which I think is likely going to be the case for most users). So, instead, I moved the parallelisation to the simulation-iteration level within do_nmb_sim()
. I'm not currently using {furrr}
but using parallel::parlapply()
instead to keep the dependencies down. I just today tested using furrr::future_map() and had no performance gain compared to parlapply. I acknowledge it may have been a better choice earlier on since I wouldn't have to do the cluster export etc but now that it's set up, I'm reluctant to revert for little performance gain and an additional dependency.
Avoiding unnecessary computations: In several places in the code, unnecessary computations are performed. For example, in the screen_meta section at the end of screen simulations function, the length of each input parameter is checked to determine if it varies across the screen. Instead of doing this check, we can store a list of the varying parameters at the beginning of the function and use it later to construct the screen_meta list. This will avoid unnecessary computations and improve the speed of the function. Another example is from the do_nmb_sim, the df_nmb_training and df_nmb_evaluation data frames are generated for each simulation, even though they are the same for each simulation. You could generate these data frames once outside the main loop and reuse them for each simulation.
I agree that the code to create the screen_meta object is a bit clunky. I have cleaned this up now so it is a lot cleaner and probably a bit faster although I think this bit of code is likely not affecting total runtime since it's only ever run once.
Use of data.table instead of dplyr: If we are dealing with small data sets (in the ranges of hundreds or thousands) then the literature shows that the difference between using data.table and dplyr is minuscule, and it is really a preference of the user. However, to further increase a package's computational speed, I recommend taking advantage of data.table's efficiency. Another note where we can utilise data.table is where there are a few places in the code where the tidyr::expand_grid() function is used to create a data frame with all possible combinations of input values. This can be slow for large datasets. Instead, we can use the CJ() function from the data.table package to create a Cartesian join of the input values. For example, CJ(sample_size, sim_auc, event_rate, min_events, meet_min_events) will create a data frame with all possible combinations of sample_size, sim_auc, event_rate, min_events, and meet_min_events.
The only times expand.grid()
is used is on the simulation input grid, and not on the iteration level of the simulations, so likely only < 50 rows for most users. Also, I intentionally use a tibble and tidyr::expand_grid()
because it allows me to have a column that contains functions with minimal fuss (the fx_nmb_training and fx_nmb_evaluation). I appreciate the tip for the Cartesian join with data.table but unfortunately I don't think it's going to lead to a large performance gain overall for users due to it's use on only ever small datasets (and never on a per-simulation iteration level).
In terms of other ways to use data.table... currently, almost all the compute time is spent mapping the iteration sequence to f_iteration_wrapper()
. This is all done either using a lapply()
, parlapply()
, or pblapply()
(if done in parallel). These return a list of outputs and are then transformed back into matrixes and data.frames but that extra transformation is extremely fast comparatively. I'm not sure if there is a way to take advantage of {data.table}
to speed up this section up. If not, I'm hesitant to add the extra dependency.
The if (inherits()) statements can be replaced with if (is.function()) to check if the arguments are functions. This is because inherits() checks if an object inherits from a class, which can be slower than checking if an object is a function.
I have updated throughout the package to avoid using inherits()
for functions and have changed them to is.function()
. Thanks!
Lastly, Using appropriate data structures can also improve the efficiency of the code. For example, instead of using a data frame to store the results for each simulation, you could use a matrix or an array to reduce memory usage and improve performance.
Currently, the df_result
and df_thresholds
objects are created (when running the simulations) as matrix
objects and only at the end transformed into data.frames. Today I tested having get_sample()
return a matrix rather than a data.frame()
but found no performance benefit. The speed gain in running get_sample()
was lost due to the additional time required to access columns from the matrix in do_nmb_sim(). (This was surprising to me but I read more about why here - https://stackoverflow.com/questions/54005918/extract-column-from-data-frame-faster-than-from-matrix-why)
Many of example are on the do_nmb_sim and screen_simulation_inputs functions are they your two main ones.
Correct - these are the two main functions that I expect users to interact with as they abstract away all the efforts of doing the simulation study. get_nmb_sampler()
may be helpful to users to create the inputs for these functions (fx_nmb_training and evaluation) and all other functions are largely internal.
Another change I made to try to improve performance was swapping out stats::glm()
for stats::glm.fit()
and getting the predicted probabilties from the estimated coefficients a bit more directly, rather than using predict.glm()
. This code is run on each iteration of the simulation and had a 15% speed up on my machine for the model fitting and getting predicted probabilities section within do_nmb_iteration()
. I appreciate you bringing my attention towards efficiency gains and testing this!
EDIT: I've just reverted the change to use stats::glm.fit()
since it was giving me a bug when the model coefficients were extreme (giving me NA predicted probabilities that would then throw off the cutpoint calcs etc). It seems that stats::glm.predict()
includes some nice NA handling with that extra compute time 👍
Sorry, I forgot when @Tinula-kariyawasam submitted his review to increment this issue!
@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/566#issuecomment-1467178633 time 12
Logged review for Tinula-kariyawasam (hours: 12)
@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/566#issuecomment-1447725606 time 4.5
Logged review for emitanaka (hours: 4.5)
Date accepted: 2023-03-30
Submitting Author Name: Rex Parsons Submitting Author Github Handle: !--author1-->@RWParsons<!--end-author1-- Other Package Authors Github handles: @robinblythe, @agbarnett Repository: https://github.com/RWParsons/predictNMB Version submitted: 0.0.1 Submission type: Stats Badge grade: silver Editor: !--editor-->@adamhsparks<!--end-editor-- Reviewers: @Tinula-kariyawasam, @emitanaka
Due date for @Tinula-kariyawasam: 2023-02-27 Due date for @emitanaka: 2023-02-28Archive: TBD Version accepted: TBD Language: en
Scope
Please indicate which of our statistical package categories this package falls under. (Please check one appropriate box below):
Statistical Packages
Pre-submission Inquiry
General Information
Who is the target audience and what are scientific applications of this package?
Paste your responses to our General Standard G1.1 here, describing whether your software is:
(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research?
Badging
What grade of badge are you aiming for? (bronze, silver, gold)
If aiming for silver or gold, describe which of the four aspects listed in the Guide for Authors chapter the package fulfils (at least one aspect for silver; three for gold)
get_nmb_sampler
).Technical checks
Confirm each of the following by checking the box.
autotest
checks on the package, and ensured no tests fail.autotest
but I get an error when I do so when testing the example code. I get the error even when I remove the documentation that initially raises the error, so think there might be an issue with autotest (?)srr_stats_pre_submit()
function confirms this package may be submitted.pkgcheck()
function confirms this package may be submitted - alternatively, please explain reasons for any checks which your package is unable to pass.This package:
Publication options
Code of conduct