RWParsons commented 1 year ago

Date accepted: 2023-03-30

Submitting Author Name: Rex Parsons Submitting Author Github Handle: !--author1-->@RWParsons@adamhsparks<!--end-editor-- Reviewers: @Tinula-kariyawasam, @emitanaka

Due date for @Tinula-kariyawasam: 2023-02-27 Due date for @emitanaka: 2023-02-28

Archive: TBD Version accepted: TBD Language: en

Paste the full DESCRIPTION file inside a code block below:

Package: predictNMB
Type: Package
Title: Evaluate Clinical Prediction Models by Net Monetary Benefit
Version: 0.0.1
Authors@R: c(
    person("Rex", "Parsons", role = c("aut", "cre"),
           email = "rex.parsons94@gmail.com",
           comment = c(ORCID = "0000-0002-6053-8174")),
    person("Robin", "Blythe", role = "aut",
           comment = c(ORCID = "0000-0002-3643-4332")),
    person("Adrian", "Barnett", role = "aut",
           comment = c(ORCID = "0000-0001-6339-0374")),
    person("Susanna", "Cramb", role = "ctb",
           comment = c(ORCID = "0000-0001-9041-9531")),
    person("Steven", "McPhail", role = "ctb",
           comment = c(ORCID = "0000-0002-1463-662X"))
  )
Description: Estimates when and where a model-guided treatment strategy may outperform 
    a treat-all or treat-none approach by Monte Carlo simulation and evaluation of the Net Monetary Benefit (NMB).
License: GPL (>= 3)
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.2.1
Roxygen: list (markdown = TRUE, roclets = c ("namespace", "rd", "srr::srr_stats_roclet"))
Imports: 
    assertthat,
    cutpointr,
    dplyr,
    ggplot2,
    magrittr,
    pmsampsize,
    rlang,
    stats,
    tibble,
    tidyr
Suggests: 
    covr,
    flextable,
    knitr,
    parallel,
    pbapply,
    rmarkdown,
    testthat (>= 3.0.0),
    vdiffr,
    withr
Config/testthat/edition: 3
VignetteBuilder: knitr
URL: https://rwparsons.github.io/predictNMB/
BugReports: https://github.com/rwparsons/predictNMB/issues

Scope

Please indicate which of our statistical package categories this package falls under. (Please check one appropriate box below):

Statistical Packages
- [ ] Bayesian and Monte Carlo Routines
- [ ] Dimensionality Reduction, Clustering, and Unsupervised Learning
- [ ] Machine Learning
- [ ] Regression and Supervised Learning
- [x] Exploratory Data Analysis (EDA) and Summary Statistics
- [ ] Spatial Analyses
- [ ] Time Series Analyses

Pre-submission Inquiry

[x] A pre-submission inquiry has been approved in issue#564

General Information

Who is the target audience and what are scientific applications of this package?
- People considering the development of a clinical decision support system used to guide patient care between two possible interventions (or non-treatment) and wanting to know how likely it is that their model will be clinically useful compared to a treat-all or treat-none strategy. Performing simulations using hypothetical clinical prediction models of likely estimated performance may show that value is unlikely and deter researchers from model development efforts. These models are rarely used in clinical practice due to them often failing to improve patient outcomes compared to standard care. All evaluation is done in health economic terms, to orient the results towards value-based care.
Paste your responses to our General Standard G1.1 here, describing whether your software is:
- This is the first implementation as far as I know. This package does involve a process to simulate clinical prediction models given a user's input of model discrimination (AUC) - the process to transform the AUC to a Cohen's D value is proposed here http://dx.doi.org/10.5093/ejpalc2018a5 and we have a journal article currently under review which uses it for the same purpose as in this package.
(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research?
- Yes

Badging

What grade of badge are you aiming for? (bronze, silver, gold)
- Silver
If aiming for silver or gold, describe which of the four aspects listed in the Guide for Authors chapter the package fulfils (at least one aspect for silver; three for gold)
- Compliance with a good number of standards beyond those identified as minimally necessary (EA : 30 / 34; G : 39 / 68; Total : 69 / 102)
- Generality: Has been developed to be useful for a wide range of clinical use cases - wherever a clinical prediction model may be used to guide the assignment of a patient to one of two possible treatments (or non-treatment). Functions have been made to be versatile to this wider range of use-cases (particularly get_nmb_sampler).

Technical checks

Confirm each of the following by checking the box.

[x] I have read the rOpenSci packaging guide.
[x] I have read the author guide and I expect to maintain this package for at least 2 years or have another maintainer identified.
[x] I/we have read the Statistical Software Peer Review Guide for Authors.
[ ] I/we have run autotest checks on the package, and ensured no tests fail.
- I have attempted to run autotest but I get an error when I do so when testing the example code. I get the error even when I remove the documentation that initially raises the error, so think there might be an issue with autotest (?)
[x] The srr_stats_pre_submit() function confirms this package may be submitted.
[x] The pkgcheck() function confirms this package may be submitted - alternatively, please explain reasons for any checks which your package is unable to pass.

This package:

[x] does not violate the Terms of Service of any service it interacts with.
[x] has a CRAN and OSI accepted license.
[x] contains a README with instructions for installing the development version.

Publication options

[x] Do you intend for this package to go on CRAN?
[ ] Do you intend for this package to go on Bioconductor?

Code of conduct

[x] I agree to abide by rOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

ropensci-review-bot commented 1 year ago

Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help for help.

ropensci-review-bot commented 1 year ago

:rocket:

Editor check started

:wave:

ropensci-review-bot commented 1 year ago

Oops, something went wrong with our automatic package checks. Our developers have been notified and package checks will appear here as soon as we've resolved the issue. Sorry for any inconvenience.

RWParsons commented 1 year ago

@ropensci-review-bot help

ropensci-review-bot commented 1 year ago

Hello @RWParsons, here are the things you can ask me to do:


# Add an author's response info to the ROpenSci logs
@ropensci-review-bot submit response <AUTHOR_RESPONSE_URL>

# List all available commands
@ropensci-review-bot help

# Show our Code of Conduct
@ropensci-review-bot code of conduct

# Invite the author of a package to the corresponding rOpenSci team. This command should be issued by the author of the package.
@ropensci-review-bot invite me to ropensci/package-name

# Adds package's repo to the rOpenSci team. This command should be issued after approval and transfer of the package.
@ropensci-review-bot finalize transfer of package-name

# Various package checks
@ropensci-review-bot check package

RWParsons commented 1 year ago

@ropensci-review-bot check package

ropensci-review-bot commented 1 year ago

Thanks, about to send the query.

ropensci-review-bot commented 1 year ago

:rocket:

Editor check started

:wave:

ropensci-review-bot commented 1 year ago

Checks for predictNMB (v0.0.1)

git hash: 0aecd6f6

:heavy_check_mark: Package name is available
:heavy_check_mark: has a 'codemeta.json' file.
:heavy_check_mark: has a 'contributing' file.
:heavy_check_mark: uses 'roxygen2'.
:heavy_check_mark: 'DESCRIPTION' has a URL field.
:heavy_check_mark: 'DESCRIPTION' has a BugReports field.
:heavy_check_mark: Package has at least one HTML vignette
:heavy_check_mark: All functions have examples.
:heavy_check_mark: Package has continuous integration checks.
:heavy_check_mark: Package coverage is 91.4%.
:heavy_check_mark: R CMD check found no errors.
:heavy_check_mark: R CMD check found no warnings.
:eyes: Function names are duplicated in other packages

(Checks marked with :eyes: may be optionally addressed.)

Package License: GPL (>= 3)

1. rOpenSci Statistical Standards (`srr` package)

This package is in the following category:

Exploratory Data Analysis

:heavy_check_mark: All applicable standards [v0.2.0] have been documented in this package (69 complied with; 33 N/A standards)

Click to see the report of author-reported standards compliance of the package with links to associated lines of code, which can be re-generated locally by running the srr_report() function from within a local clone of the repository.

2. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate. |type |package | ncalls| |:----------|:----------|------:| |internal |base | 152| |internal |predictNMB | 28| |internal |utils | 4| |imports |magrittr | 45| |imports |ggplot2 | 17| |imports |stats | 16| |imports |dplyr | 14| |imports |cutpointr | 10| |imports |tibble | 4| |imports |tidyr | 4| |imports |rlang | 2| |imports |pmsampsize | 1| |imports |assertthat | NA| |suggests |parallel | 1| |suggests |covr | NA| |suggests |flextable | NA| |suggests |knitr | NA| |suggests |rmarkdown | NA| |suggests |testthat | NA| |suggests |vdiffr | NA| |suggests |withr | NA| |linking_to |NA | NA| Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats()', and examining the 'external_calls' table.

base

list (31), c (26), lapply (10), do.call (7), t (5), unlist (5), by (4), col (4), data.frame (4), mean (4), abs (3), for (3), length (3), names (3), return (3), seq_len (3), switch (3), cbind (2), class (2), nrow (2), rep (2), sample (2), sqrt (2), all.equal (1), as.character (1), as.factor (1), colMeans (1), diff (1), environment (1), isTRUE (1), log (1), ls (1), mapply (1), max (1), min (1), replicate (1), seq_along (1), sort (1), sum (1), suppressWarnings (1), unique (1), vapply (1)

magrittr

%>% (45)

predictNMB

get_inbuilt_cutpoint (4), get_plot_data (3), cost_threshold (2), do_sample_size_calc (2), f_iteration_wrapper (2), get_row_from_sim (2), get_sample (2), total_nmb (2), add_interval (1), add_sample_size_calcs (1), approx_match (1), do_nmb_iteration (1), do_nmb_sim (1), evaluate_cutpoint (1), fill_fx_names (1), get_nmb_sampler (1), get_thresholds (1)

ggplot2

aes (7), ggplot (2), label_wrap_gen (2), geom_histogram (1), geom_linerange (1), geom_segment (1), ggplot_build (1), position_dodge (1), scale_fill_manual (1)

stats

rnorm (4), binomial (2), family (2), glm (2), median (2), pt (2), predict (1), var (1)

dplyr

mutate (4), left_join (3), row_number (2), summarize (2), across (1), inner_join (1), n (1)

cutpointr

cutpointr (5), maximize_metric (2), auc (1), sensitivity (1), specificity (1)

tibble

as_tibble (2), rownames_to_column (1), tibble (1)

tidyr

expand_grid (2), pivot_longer (2)

utils

data (4)

rlang

sym (2)

parallel

clusterExport (1)

pmsampsize

pmsampsize (1)

**NOTE:** Some imported packages appear to have no associated function calls; please ensure with author that these 'Imports' are listed appropriately.

3. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has: - code in R (100% in 12 files) and - 3 authors - 4 vignettes - no internal data file - 10 imported packages - 14 exported functions (median 27 lines of code) - 44 non-exported functions in R (median 22 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by [the `checks_to_markdown()` function](https://docs.ropensci.org/pkgcheck/reference/checks_to_markdown.html) The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|-----:|----------:|:----------| |files_R | 12| 65.5| | |files_vignettes | 5| 96.9| | |files_tests | 8| 88.2| | |loc_R | 1266| 74.2| | |loc_vignettes | 829| 88.0| | |loc_tests | 563| 77.1| | |num_vignettes | 4| 96.6|TRUE | |n_fns_r | 58| 61.1| | |n_fns_r_exported | 14| 56.3| | |n_fns_r_not_exported | 44| 64.1| | |n_fns_per_file_r | 3| 55.5| | |num_params_per_fn | 6| 84.1| | |loc_per_fn_r | 24| 69.2| | |loc_per_fn_r_exp | 27| 58.8| | |loc_per_fn_r_not_exp | 22| 68.3| | |rel_whitespace_R | 14| 68.3| | |rel_whitespace_vignettes | 30| 88.6| | |rel_whitespace_tests | 21| 75.9| | |doclines_per_fn_exp | 56| 69.3| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 32| 57.0| | ---

3a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package

4. `goodpractice` and other checks

Details of goodpractice checks (click to open)

#### 3a. Continuous Integration Badges [![R-CMD-check](https://github.com/RWParsons/predictNMB/workflows/R-CMD-check/badge.svg)](https://github.com/rwparsons/predictNMB/actions) [![pkgcheck](https://github.com/RWParsons/predictNMB/workflows/pkgcheck/badge.svg)](https://github.com/rwparsons/predictNMB/actions) **GitHub Workflow Results** | id|name |conclusion |sha | run_number|date | |----------:|:--------------------------|:----------|:------|----------:|:----------| | 3746509530|pages build and deployment |success |5f5226 | 51|2022-12-21 | | 3746435406|pkgcheck |success |0aecd6 | 3|2022-12-21 | | 3746435412|pkgdown |success |0aecd6 | 54|2022-12-21 | | 3746435410|R-CMD-check |success |0aecd6 | 117|2022-12-21 | | 3746435408|test-coverage |success |0aecd6 | 16|2022-12-21 | --- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) rcmdcheck found no errors, warnings, or notes #### Test coverage with [covr](https://covr.r-lib.org/) Package coverage: 91.41 #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) The following functions have cyclocomplexity >= 15: function | cyclocomplexity --- | --- get_inbuilt_cutpoint | 25 plot.predictNMBscreen | 20 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found the following 90 potential issues: message | number of times --- | --- Avoid library() and require() calls in packages | 11 Lines should not be more than 80 characters. | 79

5. Other Checks

Details of other checks (click to open)

:heavy_multiplication_x: The following 3 function names are duplicated in other packages: - - `get_sample` from funModeling, openeo, triplot - - `get_thresholds` from brainGraph, healthcareai - - `make_summary_table` from dglars

Package Versions

|package |version | |:--------|:---------| |pkgstats |0.1.3 | |pkgcheck |0.1.0.32 | |srr |0.0.1.186 |

Editor-in-Chief Instructions:

This package is in top shape and may be passed on to a handling editor

annakrystalli commented 1 year ago

Hello again @RWParsons ,

Just a note that most editors are on a holiday break. We will pick up assigning a handling editor as soon as everyone is back in early January. Have a nice break! ✨

RWParsons commented 1 year ago

No worries at all - thanks for the update @annakrystalli 👌!

RWParsons commented 1 year ago

@annakrystalli I just snuck in a quick change to the package to add progress bars which I had been meaning to do before submission but forgotted about until yesterday - I've updated the description in the OP of this issue and will re-run the bot to check the package. Hope this is all fine - thanks!

RWParsons commented 1 year ago

@ropensci-review-bot check package

ropensci-review-bot commented 1 year ago

Thanks, about to send the query.

ropensci-review-bot commented 1 year ago

:rocket:

Editor check started

:wave:

ropensci-review-bot commented 1 year ago

Checks for predictNMB (v0.0.1)

git hash: 2fef46dd

:heavy_check_mark: Package name is available
:heavy_check_mark: has a 'codemeta.json' file.
:heavy_check_mark: has a 'contributing' file.
:heavy_check_mark: uses 'roxygen2'.
:heavy_check_mark: 'DESCRIPTION' has a URL field.
:heavy_check_mark: 'DESCRIPTION' has a BugReports field.
:heavy_check_mark: Package has at least one HTML vignette
:heavy_check_mark: All functions have examples.
:heavy_check_mark: Package has continuous integration checks.
:heavy_check_mark: Package coverage is 90.6%.
:heavy_check_mark: R CMD check found no errors.
:heavy_check_mark: R CMD check found no warnings.
:eyes: Function names are duplicated in other packages

(Checks marked with :eyes: may be optionally addressed.)

Package License: GPL (>= 3)

1. rOpenSci Statistical Standards (`srr` package)

This package is in the following category:

Exploratory Data Analysis

:heavy_check_mark: All applicable standards [v0.2.0] have been documented in this package (69 complied with; 33 N/A standards)

Click to see the report of author-reported standards compliance of the package with links to associated lines of code, which can be re-generated locally by running the srr_report() function from within a local clone of the repository.

2. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate. |type |package | ncalls| |:----------|:----------|------:| |internal |base | 152| |internal |predictNMB | 28| |internal |utils | 4| |imports |magrittr | 45| |imports |ggplot2 | 17| |imports |stats | 16| |imports |dplyr | 14| |imports |cutpointr | 10| |imports |tibble | 4| |imports |tidyr | 4| |imports |rlang | 2| |imports |pmsampsize | 1| |imports |assertthat | NA| |suggests |parallel | 1| |suggests |pbapply | 1| |suggests |covr | NA| |suggests |flextable | NA| |suggests |knitr | NA| |suggests |rmarkdown | NA| |suggests |testthat | NA| |suggests |vdiffr | NA| |suggests |withr | NA| |linking_to |NA | NA| Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats()', and examining the 'external_calls' table.

base

list (31), c (26), lapply (9), do.call (7), t (5), unlist (5), by (4), col (4), data.frame (4), mean (4), seq_len (4), abs (3), for (3), length (3), names (3), return (3), switch (3), cbind (2), class (2), nrow (2), rep (2), sample (2), sqrt (2), all.equal (1), as.character (1), as.factor (1), colMeans (1), diff (1), environment (1), isTRUE (1), log (1), ls (1), mapply (1), max (1), min (1), replicate (1), seq_along (1), sort (1), sum (1), suppressWarnings (1), unique (1), vapply (1)

magrittr

%>% (45)

predictNMB

get_inbuilt_cutpoint (4), get_plot_data (3), cost_threshold (2), do_sample_size_calc (2), f_iteration_wrapper (2), get_row_from_sim (2), get_sample (2), total_nmb (2), add_interval (1), add_sample_size_calcs (1), approx_match (1), do_nmb_iteration (1), do_nmb_sim (1), evaluate_cutpoint (1), fill_fx_names (1), get_nmb_sampler (1), get_thresholds (1)

ggplot2

aes (7), ggplot (2), label_wrap_gen (2), geom_histogram (1), geom_linerange (1), geom_segment (1), ggplot_build (1), position_dodge (1), scale_fill_manual (1)

stats

rnorm (4), binomial (2), family (2), glm (2), median (2), pt (2), predict (1), var (1)

dplyr

mutate (4), left_join (3), row_number (2), summarize (2), across (1), inner_join (1), n (1)

cutpointr

cutpointr (5), maximize_metric (2), auc (1), sensitivity (1), specificity (1)

tibble

as_tibble (2), rownames_to_column (1), tibble (1)

tidyr

expand_grid (2), pivot_longer (2)

utils

data (4)

rlang

sym (2)

parallel

clusterExport (1)

pbapply

pblapply (1)

pmsampsize

pmsampsize (1)

**NOTE:** Some imported packages appear to have no associated function calls; please ensure with author that these 'Imports' are listed appropriately.

3. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has: - code in R (100% in 12 files) and - 3 authors - 4 vignettes - no internal data file - 10 imported packages - 14 exported functions (median 27 lines of code) - 44 non-exported functions in R (median 22 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by [the `checks_to_markdown()` function](https://docs.ropensci.org/pkgcheck/reference/checks_to_markdown.html) The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|-----:|----------:|:----------| |files_R | 12| 65.5| | |files_vignettes | 5| 96.9| | |files_tests | 8| 88.2| | |loc_R | 1304| 74.8| | |loc_vignettes | 831| 88.0| | |loc_tests | 591| 78.1| | |num_vignettes | 4| 96.6|TRUE | |n_fns_r | 58| 61.1| | |n_fns_r_exported | 14| 56.3| | |n_fns_r_not_exported | 44| 64.1| | |n_fns_per_file_r | 3| 55.5| | |num_params_per_fn | 6| 84.1| | |loc_per_fn_r | 24| 69.2| | |loc_per_fn_r_exp | 27| 58.8| | |loc_per_fn_r_not_exp | 22| 68.3| | |rel_whitespace_R | 14| 68.5| | |rel_whitespace_vignettes | 30| 88.6| | |rel_whitespace_tests | 22| 77.7| | |doclines_per_fn_exp | 56| 69.3| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 32| 57.0| | ---

3a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package

4. `goodpractice` and other checks

Details of goodpractice checks (click to open)

#### 3a. Continuous Integration Badges [![R-CMD-check](https://github.com/RWParsons/predictNMB/workflows/R-CMD-check/badge.svg)](https://github.com/rwparsons/predictNMB/actions) [![pkgcheck](https://github.com/RWParsons/predictNMB/workflows/pkgcheck/badge.svg)](https://github.com/rwparsons/predictNMB/actions) **GitHub Workflow Results** | id|name |conclusion |sha | run_number|date | |----------:|:--------------------------|:----------|:------|----------:|:----------| | 3835432164|pages build and deployment |success |7bf6a7 | 53|2023-01-04 | | 3835300178|pkgcheck |success |2fef46 | 5|2023-01-04 | | 3835300175|pkgdown |success |2fef46 | 56|2023-01-04 | | 3835300177|R-CMD-check |success |2fef46 | 119|2023-01-04 | | 3835300174|test-coverage |success |2fef46 | 18|2023-01-04 | --- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) rcmdcheck found no errors, warnings, or notes #### Test coverage with [covr](https://covr.r-lib.org/) Package coverage: 90.64 #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) The following functions have cyclocomplexity >= 15: function | cyclocomplexity --- | --- get_inbuilt_cutpoint | 25 plot.predictNMBscreen | 20 screen_simulation_inputs | 16 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found the following 90 potential issues: message | number of times --- | --- Avoid library() and require() calls in packages | 11 Lines should not be more than 80 characters. | 79

5. Other Checks

Details of other checks (click to open)

:heavy_multiplication_x: The following 3 function names are duplicated in other packages: - - `get_sample` from funModeling, openeo, triplot - - `get_thresholds` from brainGraph, healthcareai - - `make_summary_table` from dglars

Package Versions

|package |version | |:--------|:---------| |pkgstats |0.1.3 | |pkgcheck |0.1.0.32 | |srr |0.0.1.186 |

Editor-in-Chief Instructions:

This package is in top shape and may be passed on to a handling editor

annakrystalli commented 1 year ago

@ropensci-review-bot check srr

ropensci-review-bot commented 1 year ago

'srr' standards compliance:

Complied with: 69 / 102 = 67.6% (eda: 30 / 34; general: 39 / 68)
Not complied with: 33 / 102 = 32.4% (eda: 4 / 34; general: 29 / 68)

:heavy_check_mark: This package complies with > 50% of all standads and may be submitted.

annakrystalli commented 1 year ago

Thank you @RWParsons for the updates.

We are now looking for a handling editor 👍

maurolepore commented 1 year ago

Dear @RWParsons,

Today starts my rotation as EiC so the role of @annakrystalli is now mine. I've watched the conversations here and among editors and I see we're still looking for a handling editor. Thanks for your patience.

RWParsons commented 1 year ago

Hi @maurolepore,

No worries - please let me know if I can to do anything to help progress this.

maurolepore commented 1 year ago

@ropensci-review-bot assign @adamhsparks as editor

ropensci-review-bot commented 1 year ago

Assigned! @adamhsparks is now the editor

adamhsparks commented 1 year ago

Hi @RWParsons, I'll be your handling editor for this submission. This is my first stats package submission, so please bear with me, but I think we'll be alright. ☺️

I've performed the editor checks on your package and have a few items that should be addressed before the reviewers start their work.

Editor checks:

[x] Documentation: The package has sufficient documentation available online (README, pkgdown docs) to allow for an assessment of functionality and scope without installing the package. In particular,
- [x] Is the case for the package well made?
- [x] Is the reference index page clear (grouped by topic if necessary)?
- [x] Are vignettes readable, sufficiently detailed and not just perfunctory?
[x] Fit: The package meets criteria for fit and overlap.
[x] Installation instructions: Are installation instructions clear enough for human users?
[x] Tests: If the package has some interactivity / HTTP / plot production etc. are the tests using state-of-the-art tooling?
[x] Contributing information: Is the documentation for contribution clear enough e.g. tokens for tests, playgrounds?
[x] License: The package has a CRAN or OSI accepted license.
[x] Project management: Are the issue and PR trackers in a good shape, e.g. are there outstanding bugs, is it clear when feature requests are meant to be tackled?

Editor comments

Vignettes and documentation

If you were to rename the "introduction-to-predictNMB.Rmd" to "predict_NMB.Rmd" it would follow a common schema in R packages, especially those that use {pkgdown} and you would have a "Get Started" menu item for the introductory vignette for users to easily find, e.g. https://docs.ropensci.org/nasapower/articles/nasapower.html.

Package integrity

There should not be a lazydata: true field in the DESCRIPTION file, there is no data in the package

Omitted ‘LazyData’ from DESCRIPTION

The dependency on R>= 3.5.0 should be set for .Rds objects used in testing

NB: this package now depends on R (>= 3.5.0) WARNING: Added dependency on R >= 3.5.0 because serialized objects in serialize/load version 3 cannot be read in older versions of R. File(s) containing such objects: ‘predictNMB/tests/testthat/fixtures/predictNMBscreen_object.rds’ ‘predictNMB/tests/testthat/fixtures/predictNMBsim_object.rds’

Tests

I had 8 tests fail, 4 warn, 0 skipped and 78 pass

{goodpractice} checks

There are a few comments that could have the line-length shortened to be <80 chars. Please check with goodpractice::gp() and address as necessary and possible.

Spell check

spellcheck::spell_check() returns several hits. Please set up spell-checking and specify the language in the DESCRIPTION file. You can do this with the {spelling} package. As a fellow Aussie, I recognise many of these are not misspelled, but {spelling} will allow you to build a wordlist to accept them and not flag in future checks.

DESCRIPTION does not contain 'Language' field. Defaulting to 'en-US'.
  WORD               FOUND IN
AUD                detailed-example.Rmd:36,37
behaviour          summarising-results-with-predictNMB.Rmd:250
categorised        introduction-to-predictNMB.Rmd:63
centre             introduction-to-predictNMB.Rmd:117
CMD                README.md:8
                   README.Rmd:28
Codecov            README.md:12
                   README.Rmd:30
colour             plot.predictNMBsim.Rd:49,50
Colour             plot.predictNMBsim.Rd:59
coloured           plot.predictNMBscreen.Rd:60
                   plot.predictNMBsim.Rd:37
colours            plot.predictNMBsim.Rd:48
                   summarising-results-with-predictNMB.Rmd:221
FN                 do_nmb_sim.Rd:57
                   creating-nmb-functions.Rmd:24,67
                   introduction-to-predictNMB.Rmd:40,42,63
FNs                creating-nmb-functions.Rmd:121
FP                 do_nmb_sim.Rd:57
                   creating-nmb-functions.Rmd:23,65
                   introduction-to-predictNMB.Rmd:39,42,63
frac               creating-nmb-functions.Rmd:64
grey               summarising-results-with-predictNMB.Rmd:221
inb                make_summary_table.predictNMBscreen.Rd:26,54
                   make_summary_table.predictNMBsim.Rd:25,50
                   make_summary_table.Rd:26,55
                   plot.predictNMBscreen.Rd:38
                   plot.predictNMBsim.Rd:29
INB                introduction-to-predictNMB.Rmd:133
                   summarising-results-with-predictNMB.Rmd:110
magrittr           pipe.Rd:10,12
maximise           creating-nmb-functions.Rmd:32
                   detailed-example.Rmd:32
                   introduction-to-predictNMB.Rmd:25
maximises          do_nmb_sim.Rd:41
                   introduction-to-predictNMB.Rmd:160
maximising         introduction-to-predictNMB.Rmd:194
minimisation       introduction-to-predictNMB.Rmd:123
minimises          do_nmb_sim.Rd:44
minimising         do_nmb_sim.Rd:44
                   screen_simulation_inputs.Rd:53
                   detailed-example.Rmd:95
                   introduction-to-predictNMB.Rmd:143
modelling          creating-nmb-functions.Rmd:35,127,285
nmb                make_summary_table.predictNMBscreen.Rd:26,27,53
                   make_summary_table.predictNMBsim.Rd:25,26,49
                   make_summary_table.Rd:26,27,54
                   plot.predictNMBscreen.Rd:38,39
                   plot.predictNMBsim.Rd:29,30
NMB                do_nmb_sim.Rd:41,56,62,66,98
                   evaluate_cutpoint.Rd:5,16,19,23
                   get_inbuilt_cutpoint.Rd:20
                   get_nmb_sampler.Rd:5,59
                   get_thresholds.Rd:5,14,24
                   make_summary_table.predictNMBscreen.Rd:53,54
                   make_summary_table.predictNMBsim.Rd:49,50
                   make_summary_table.Rd:54,55
                   screen_simulation_inputs.Rd:42,46
                   description:2
                   README.md:21,30,48,73,128,171
                   README.Rmd:37,41,55,75,126,146
                   creating-nmb-functions.Rmd:2,17,19,32,39,60,121,123,150,187,207,208,231,245
                   detailed-example.Rmd:47,68,95,99
                   introduction-to-predictNMB.Rmd:23,25,35,61,65,83,104,117,145,160
                   summarising-results-with-predictNMB.Rmd:85,88,99,110
optimisation       introduction-to-predictNMB.Rmd:123
optimising         do_nmb_sim.Rd:41
                   screen_simulation_inputs.Rd:53
                   detailed-example.Rmd:95,123,178
                   introduction-to-predictNMB.Rmd:143
                   summarising-results-with-predictNMB.Rmd:221
ORCID              predictNMB-package.Rd:23,27,28,33,34
performant         README.md:21
                   README.Rmd:37
pkgcheck           README.md:16
                   README.Rmd:32
predictNMBscreen   print.predictNMBscreen.Rd:5,18
predictNMBsim      print.predictNMBsim.Rd:5,18
PSA                creating-nmb-functions.Rmd:69
QALY               creating-nmb-functions.Rmd:212
QALYs              get_nmb_sampler.Rd:28
                   creating-nmb-functions.Rmd:204,210,212
                   detailed-example.Rmd:32
randomised         detailed-example.Rmd:28
roc                do_nmb_sim.Rd:48
stabilised         summarising-results-with-predictNMB.Rmd:221
summarise          make_summary_table.predictNMBscreen.Rd:26
                   make_summary_table.predictNMBsim.Rd:25
                   make_summary_table.Rd:26
                   plot.predictNMBscreen.Rd:38
                   plot.predictNMBsim.Rd:29
                   summarising-results-with-predictNMB.Rmd:26
summarising        plot.predictNMBscreen.Rd:36
                   introduction-to-predictNMB.Rmd:125
Summarising        README.md:167
                   README.Rmd:144
                   summarising-results-with-predictNMB.Rmd:2
TP                 do_nmb_sim.Rd:57
                   creating-nmb-functions.Rmd:21,64
                   introduction-to-predictNMB.Rmd:37,42
unintuitive        creating-nmb-functions.Rmd:44
visualise          plot.predictNMBscreen.Rd:82
                   plot.predictNMBsim.Rd:79
                   README.md:95
                   README.Rmd:104
                   introduction-to-predictNMB.Rmd:117,133
                   summarising-results-with-predictNMB.Rmd:26,85,88,99,110
visualised         introduction-to-predictNMB.Rmd:117
visualising        README.md:111
                   README.Rmd:112
Whitty             detailed-example.Rmd:180
WTP                creating-nmb-functions.Rmd:204,212
                   detailed-example.Rmd:32
youden             do_nmb_sim.Rd:42
                   detailed-example.Rmd:123
Youden             do_nmb_sim.Rd:42
                   creating-nmb-functions.Rmd:231
                   detailed-example.Rmd:95,97,144,178
                   introduction-to-predictNMB.Rmd:160,194

Metadata

In the README, there is an incomplete thought on line 79, "cl: (Optional) users can pass a cluster as the. If it is passed, the simulations are run in parallel (faster).", emphasis mine.

Admittedly, this isn't my area of research, but I think that prospective users would benefit from a description of what the fields are in fx_nmb() and that get_nmb_sampler() returns a function, not an object as this is unusual behaviour in R. The introductory vignette does cover this, so maybe just a pointer to that section of the document is sufficient for the uninitiated (like me).

There is no "CONTRIBUTING.md", please add one.

Other comments

And lastly, RWParsons, could you please add the rOpenSci under review badge to your README?

[![](https://badges.ropensci.org/566_status.svg)](https://github.com/ropensci/software-review/issues/566)

I'm looking for reviewers now.

Reviewers: Due date:

RWParsons commented 1 year ago

Hi @adamhsparks,

This is my first ropensci submission so I'm bound to make mistakes here too!

Thanks for the review - I'll be working on them this week.

I have a CONTRIBUTING.md file here (https://github.com/RWParsons/predictNMB/blob/master/.github/CONTRIBUTING.md) Is this in the wrong place?

Thanks!

adamhsparks commented 1 year ago

{spelling} will let you update the wordlist to include any words you know are correct. In the past I have used "EN-AU" but it's occasionally not worked properly for me, so I just update the wordlist and leave the spelling as "EN-US" and use {spelling} to ensure I didn't introduce any new spelling errors after confirming what I wanted for Australian spelling, names, etc.

I missed the CONTRIBUTING.md, sorry. It is in the right spot. Just me getting ahead of myself. :)

adamhsparks commented 1 year ago

@ropensci-review-bot seeking reviewers

ropensci-review-bot commented 1 year ago

Please add this badge to the README of your package repository:

[![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/566_status.svg)](https://github.com/ropensci/software-review/issues/566)

Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news

adamhsparks commented 1 year ago

@ropensci-review-bot add @Tinula-kariyawasam to reviewers

ropensci-review-bot commented 1 year ago

@Tinula-kariyawasam added to the reviewers list. Review due date is 2023-02-27. Thanks @Tinula-kariyawasam for accepting to review! Please refer to our reviewer guide.

rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.

ropensci-review-bot commented 1 year ago

@Tinula-kariyawasam: If you haven't done so, please fill this form for us to update our reviewers records.

adamhsparks commented 1 year ago

@ropensci-review-bot add @emitanaka to reviewers

ropensci-review-bot commented 1 year ago

@emitanaka added to the reviewers list. Review due date is 2023-02-28. Thanks @emitanaka for accepting to review! Please refer to our reviewer guide.

rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.

ropensci-review-bot commented 1 year ago

@emitanaka: If you haven't done so, please fill this form for us to update our reviewers records.

ropensci-review-bot commented 1 year ago

:calendar: @Tinula-kariyawasam you have 2 days left before the due date for your review (2023-02-27).

ropensci-review-bot commented 1 year ago

:calendar: @emitanaka you have 2 days left before the due date for your review (2023-02-28).

emitanaka commented 1 year ago

predictNMB

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Briefly describe any working relationship you have (had) with the package authors.
[x] As the reviewer I confirm that there are no [conflicts of interest](https://devguide.ropensci.org/policies.html#coi) for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[x] A statement of need: clearly stating problems the software is designed to solve and its target audience in README
[x] Installation instructions: for the development version of package and any non-standard dependencies in README
[x] Vignette(s): demonstrating major functionality that runs successfully locally
[x] Function Documentation: for all exported functions
[x] Examples: (that run successfully locally) for all exported functions
[ ] Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

[x] Installation: Installation succeeds as documented.
[x] Functionality: Any functional claims of the software been confirmed.
[x] Performance: Any performance claims of the software been confirmed.
[x] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
[x] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing: 4.5

[x] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

Hi @RWParsons and other authors,

This is a neat package that I imagine will be helpful in decision making for clinical scenarios! The following are points to address some of the missing components from the rOpenSci guideline and some suggestions to hopefully make the package better. Some small typos/grammar fixes were put in as a pull request.

Contribution guideline appears to be missing (although I note that ropensci-review-bot seems to indicate it has found it). All other points under "Community guidelines" are satisfied.
I recommend calling plot as autoplot instead. The general convention for plot is for base plots and autoplot for ggplot outputs. Or alternative suggestion is to make it a synonym: autoplot.predictNMBsim <- plot.predictNMBsim.
I suggest remove extra_theme argument in the plot and get people to append this to the result of plot().
I find the argument return_all_methods in get_inbuilt_cutpoint() a bit awkward and isn't consistent with the main goal of the function (i.e. return a cutpoint). Would it be better to create another function that returns all inbuilt methods? E.g. get_inbuilt_cutpoint_methods() that return a vector of method names.
I suggest the ci argument to either be conf.level just like in t.test() or use more expressive argument
For make_summary_table(), I wonder if you want to leverage the S3 generic summary()?
For make_summary_table(), I kind of think you can combine the arguments what and rename_vector.

Some performance checks for ones I know the correct value:

Checking get_sample() - looks good

# should be roughly 0.7
yardstick::roc_auc(get_sample(0.7, 10000, 0.1), factor(1 - actual), x)
# should be roughly 0.9 and 0.1
table(get_sample(0.7, 10000, 0.1)$actual)/10000

Other functions, I can't think of straightforward tests but I don't see anything particularly wrong in the code.
Every test and examples seem to be working fine.

I am new to reviewing software so apologies if something in this review is not quite right or missing content I should have included.

adamhsparks commented 1 year ago

Thanks, @emitanaka, for your thoughtful and thorough review.

There is a CONTRIBUTING.md file, it's in the .github directory. I had the same comment. I'm unsure why I missed it or what I thought I was looking for, interesting that you had the same comment though. Also, it does have the proper URL and BugReports fields in DESCRIPTION.

@Tinula-kariyawasam, are you able to submit your review soon or do you require some more time?

RWParsons commented 1 year ago

Thanks, @emitanaka for the thorough review and useful feedback - it's definitely improved the quality of the package and taught me a bit more about best practice!

I have implemented all your suggested changes except for combining the what and rename_vector for the make_summary_table() (now changed to summary()). For this method (and same for the plot() - now autoplot() - method), the what pertains to the value being presented (selected cutpoint/NMB/INB) whereas the rename_vector corresponds to the cutpoint selection method. Maybe I'm misunderstanding your point and there's a way to combine these that I'm not thinking of though!

On the extra_theme comment, I was adding this to the graphic but also ... + theme_bw(). I have made the change to move the extra_theme content to theme_sim() and shown how to add this in the documentation and all examples in the README/vignettes. Do you think that I should also avoid applying theme_bw() to the graphics being returned (or at least add an argument for controlling this behaviour)?

I do have a performance test for get_sample() but it was within a poorly named test file. I have renamed the test file to be test-get_sample.R and this assesses the AUC of the resulting model compared to the specified value when calling get_sample().

Tinula-kariyawasam commented 1 year ago

Thanks, @emitanaka, for your thoughtful and thorough review.

There is a CONTRIBUTING.md file, it's in the .github directory. I had the same comment. I'm unsure why I missed it or what I thought I was looking for, interesting that you had the same comment though. Also, it does have the proper URL and BugReports fields in DESCRIPTION.

@Tinula-kariyawasam, are you able to submit your review soon or do you require some more time?

Hi @adamhsparks, just require a bit more time to finish my review. I shall have my review submitted by the 24th March. Thanks for the understanding

adamhsparks commented 1 year ago

Thanks, @Tinula-kariyawasam

Tinula-kariyawasam commented 1 year ago

Hi @RWParsons , here is my review:

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Briefly describe any working relationship you have (had) with the package authors.
[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[x] A statement of need: clearly stating problems the software is designed to solve and its target audience in README
[x] Installation instructions: for the development version of package and any non-standard dependencies in README
[x] Vignette(s): demonstrating major functionality that runs successfully locally
[x] Function Documentation: for all exported functions
[x] Examples: (that run successfully locally) for all exported functions
[x] Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

[x] Installation: Installation succeeds as documented.
[x] Functionality: Any functional claims of the software been confirmed.
[x] Performance: Any performance claims of the software been confirmed.
[x] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
[x] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing: 12

[x] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

Congratulations on this great package. It is a really well-thought-out project, and no barring faults or issues are evident at all.

It works really well, is simple and easy to understand.

Thoroughly checks through all edge cases.

Two areas of improvement would entirely satisfy me on this package:

The use of more examples and use cases.
Efficiency and optimisation of the code, to elaborate;
- Try to use vectorisation in your functions as much as possible. For example, instead of using a loop to generate simulated data for df_nmb_training and df_nmb_evaluation, you could use replicate() and do.call() to generate the data in a more vectorised manner. There are many occurrences of the 'lapply' function being utilised. A specific example is from the screen_simulation_inputs where the function used in the simulations loop can be replaced with a vectorised operations (e.g purrr:map(seq_len(nrow(input_grid)), \~ {...})))
- Parallelisation: The lapply() function can be parallelised using the furrr package. By default, lapply() runs sequentially, but furrr allows us to run the loop in parallel on multiple cores, which can significantly speed up the computation time. We can parallelise the loop using future_map() or future_pmap(). In the screen_simulation_inputs function, for example future_map(input_grid, \~ do_nmb_sim(...)) will run the function do_nmb_sim() on each row of input_grid in parallel.
- Avoiding unnecessary computations: In several places in the code, unnecessary computations are performed. For example, in the screen_meta section at the end of screen simulations function, the length of each input parameter is checked to determine if it varies across the screen. Instead of doing this check, we can store a list of the varying parameters at the beginning of the function and use it later to construct the screen_meta list. This will avoid unnecessary computations and improve the speed of the function. Another example is from the do_nmb_sim, the df_nmb_training and df_nmb_evaluation data frames are generated for each simulation, even though they are the same for each simulation. You could generate these data frames once outside the main loop and reuse them for each simulation.
- Use of data.table instead of dplyr: If we are dealing with small data sets (in the ranges of hundreds or thousands) then the literature shows that the difference between using data.table and dplyr is minuscule, and it is really a preference of the user. However, to further increase a package's computational speed, I recommend taking advantage of data.table's efficiency. Another note where we can utilise data.table is where there are a few places in the code where the tidyr::expand_grid() function is used to create a data frame with all possible combinations of input values. This can be slow for large datasets. Instead, we can use the CJ() function from the data.table package to create a Cartesian join of the input values. For example, CJ(sample_size, sim_auc, event_rate, min_events, meet_min_events) will create a data frame with all possible combinations of sample_size, sim_auc, event_rate, min_events, and meet_min_events.
- The if (inherits()) statements can be replaced with if (is.function()) to check if the arguments are functions. This is because inherits() checks if an object inherits from a class, which can be slower than checking if an object is a function.
- Lastly, Using appropriate data structures can also improve the efficiency of the code. For example, instead of using a data frame to store the results for each simulation, you could use a matrix or an array to reduce memory usage and improve performance.
- Many of example are on the do_nmb_sim and screen_simulation_inputs functions are they your two main ones.
I have modified and rewritten your code on my own personal device that aligns with the statements I made above then by using the microbenchmark function from the microbenchmark library checked if the run times of the base function or modified had a quicker evaluation based on 1000 runs, at each test the modified was quicker in terms of nanoseconds.

Overall there are no significant changes that I see need to be made to the package, and all tests and examples are working well. Just from a perspective of how we can improve the computational efficiency of this package, some changes need to be made.

adamhsparks commented 1 year ago

@Tinula-kariyawasam, thank you for that detailed review!

@RWParsons, please let me know what assistance you may need.

RWParsons commented 1 year ago

Thanks for the review and helpful suggestions, @Tinula-kariyawasam !

Where would you like more use-cases and examples? Do you mean in addition to the vignettes or could any particular areas of the vignettes be expanded upon more?

I agree that there's definitely some areas where computational speed could be improved and would be happy to improve things (almost) everywhere you suggest. Could you please share with me the edited version of the package that you have on your device?

RWParsons commented 1 year ago

Hi @Tinula-kariyawasam, I've just been testing out some of your suggestions and working on some other areas where computation efficiency could be improved now that you've brought my attention there.

Try to use vectorisation in your functions as much as possible. For example, instead of using a loop to generate simulated data for df_nmb_training and df_nmb_evaluation, you could use replicate() and do.call() to generate the data in a more vectorised manner. There are many occurrences of the 'lapply' function being utilised. A specific example is from the screen_simulation_inputs where the function used in the simulations loop can be replaced with a vectorised operations (e.g purrr:map(seq_len(nrow(input_grid)), ~ {...})))

I know there are a lot of lapply()'s about in the code. I just tested swapping the one that's used most heavily (lapply(seq_len(n_sims), f_iteration_wrapper) from do_nmb_sim.R) and had almost no performance gain when swapping it for replicate(), but did introduce the problem of not being able to pass the iteration number so that the df_nmb_training is indexed appropriately. I think the use of lapply() in screen_simulation_inputs() are pretty inconsequential since they are only used for manipulating the grid of inputs to do_nmb_sim() which is the function that does all the heavy lifting.

Parallelisation: The lapply() function can be parallelised using the furrr package. By default, lapply() runs sequentially, but furrr allows us to run the loop in parallel on multiple cores, which can significantly speed up the computation time. We can parallelise the loop using future_map() or future_pmap(). In the screen_simulation_inputs function, for example future_map(input_grid, ~ do_nmb_sim(...)) will run the function do_nmb_sim() on each row of input_grid in parallel.

I agree that parallelisation is beneficial here! I had initially done almost exactly as you had suggested - mapping the input_grid to do_nmb_sim() in parallel. However, I found that this was problematic when the input grid was small and the number of workers available much larger (which I think is likely going to be the case for most users). So, instead, I moved the parallelisation to the simulation-iteration level within do_nmb_sim(). I'm not currently using {furrr} but using parallel::parlapply() instead to keep the dependencies down. I just today tested using furrr::future_map() and had no performance gain compared to parlapply. I acknowledge it may have been a better choice earlier on since I wouldn't have to do the cluster export etc but now that it's set up, I'm reluctant to revert for little performance gain and an additional dependency.

Avoiding unnecessary computations: In several places in the code, unnecessary computations are performed. For example, in the screen_meta section at the end of screen simulations function, the length of each input parameter is checked to determine if it varies across the screen. Instead of doing this check, we can store a list of the varying parameters at the beginning of the function and use it later to construct the screen_meta list. This will avoid unnecessary computations and improve the speed of the function. Another example is from the do_nmb_sim, the df_nmb_training and df_nmb_evaluation data frames are generated for each simulation, even though they are the same for each simulation. You could generate these data frames once outside the main loop and reuse them for each simulation.

I agree that the code to create the screen_meta object is a bit clunky. I have cleaned this up now so it is a lot cleaner and probably a bit faster although I think this bit of code is likely not affecting total runtime since it's only ever run once.

Use of data.table instead of dplyr: If we are dealing with small data sets (in the ranges of hundreds or thousands) then the literature shows that the difference between using data.table and dplyr is minuscule, and it is really a preference of the user. However, to further increase a package's computational speed, I recommend taking advantage of data.table's efficiency. Another note where we can utilise data.table is where there are a few places in the code where the tidyr::expand_grid() function is used to create a data frame with all possible combinations of input values. This can be slow for large datasets. Instead, we can use the CJ() function from the data.table package to create a Cartesian join of the input values. For example, CJ(sample_size, sim_auc, event_rate, min_events, meet_min_events) will create a data frame with all possible combinations of sample_size, sim_auc, event_rate, min_events, and meet_min_events.

The only times expand.grid() is used is on the simulation input grid, and not on the iteration level of the simulations, so likely only < 50 rows for most users. Also, I intentionally use a tibble and tidyr::expand_grid() because it allows me to have a column that contains functions with minimal fuss (the fx_nmb_training and fx_nmb_evaluation). I appreciate the tip for the Cartesian join with data.table but unfortunately I don't think it's going to lead to a large performance gain overall for users due to it's use on only ever small datasets (and never on a per-simulation iteration level).

In terms of other ways to use data.table... currently, almost all the compute time is spent mapping the iteration sequence to f_iteration_wrapper(). This is all done either using a lapply(), parlapply(), or pblapply() (if done in parallel). These return a list of outputs and are then transformed back into matrixes and data.frames but that extra transformation is extremely fast comparatively. I'm not sure if there is a way to take advantage of {data.table} to speed up this section up. If not, I'm hesitant to add the extra dependency.

The if (inherits()) statements can be replaced with if (is.function()) to check if the arguments are functions. This is because inherits() checks if an object inherits from a class, which can be slower than checking if an object is a function.

I have updated throughout the package to avoid using inherits() for functions and have changed them to is.function(). Thanks!

Lastly, Using appropriate data structures can also improve the efficiency of the code. For example, instead of using a data frame to store the results for each simulation, you could use a matrix or an array to reduce memory usage and improve performance.

Currently, the df_result and df_thresholds objects are created (when running the simulations) as matrix objects and only at the end transformed into data.frames. Today I tested having get_sample() return a matrix rather than a data.frame() but found no performance benefit. The speed gain in running get_sample() was lost due to the additional time required to access columns from the matrix in do_nmb_sim(). (This was surprising to me but I read more about why here - https://stackoverflow.com/questions/54005918/extract-column-from-data-frame-faster-than-from-matrix-why)

Many of example are on the do_nmb_sim and screen_simulation_inputs functions are they your two main ones.

Correct - these are the two main functions that I expect users to interact with as they abstract away all the efforts of doing the simulation study. get_nmb_sampler() may be helpful to users to create the inputs for these functions (fx_nmb_training and evaluation) and all other functions are largely internal.

Another change I made to try to improve performance was swapping out stats::glm() for stats::glm.fit() and getting the predicted probabilties from the estimated coefficients a bit more directly, rather than using predict.glm(). This code is run on each iteration of the simulation and had a 15% speed up on my machine for the model fitting and getting predicted probabilities section within do_nmb_iteration(). I appreciate you bringing my attention towards efficiency gains and testing this!

EDIT: I've just reverted the change to use stats::glm.fit() since it was giving me a bug when the model coefficients were extreme (giving me NA predicted probabilities that would then throw off the cutpoint calcs etc). It seems that stats::glm.predict() includes some nice NA handling with that extra compute time 👍

adamhsparks commented 1 year ago

Sorry, I forgot when @Tinula-kariyawasam submitted his review to increment this issue!

adamhsparks commented 1 year ago

@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/566#issuecomment-1467178633 time 12

ropensci-review-bot commented 1 year ago

Logged review for Tinula-kariyawasam (hours: 12)

adamhsparks commented 1 year ago

@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/566#issuecomment-1447725606 time 4.5

ropensci-review-bot commented 1 year ago

Logged review for emitanaka (hours: 4.5)

ropensci / software-review

predictNMB: Clinical Prediction Models by Net Monetary Benefit #566

Archive: TBD Version accepted: TBD Language: en

Scope

Pre-submission Inquiry

General Information

Badging

Technical checks

Publication options

Code of conduct

Checks for predictNMB (v0.0.1)

1. rOpenSci Statistical Standards (srr package)

2. Package Dependencies

3. Statistical Properties

3a. Network visualisation

4. goodpractice and other checks

5. Other Checks

Editor-in-Chief Instructions:

Checks for predictNMB (v0.0.1)

1. rOpenSci Statistical Standards (srr package)

2. Package Dependencies

3. Statistical Properties

3a. Network visualisation

4. goodpractice and other checks

5. Other Checks

Editor-in-Chief Instructions:

'srr' standards compliance:

Editor checks:

[x] Project management: Are the issue and PR trackers in a good shape, e.g. are there outstanding bugs, is it clear when feature requests are meant to be tackled?

Editor comments

Vignettes and documentation

Package integrity

Tests

{goodpractice} checks

Spell check

Metadata

Other comments

predictNMB

Package Review

Documentation

Functionality

Review Comments

Package Review

Documentation

Functionality

Review Comments

1. rOpenSci Statistical Standards (`srr` package)

4. `goodpractice` and other checks

1. rOpenSci Statistical Standards (`srr` package)

4. `goodpractice` and other checks