ropensci / software-review

rOpenSci Software Peer Review.
286 stars 104 forks source link

predictNMB: Clinical Prediction Models by Net Monetary Benefit #566

Closed RWParsons closed 1 year ago

RWParsons commented 1 year ago

Date accepted: 2023-03-30

Submitting Author Name: Rex Parsons Submitting Author Github Handle: !--author1-->@RWParsons<!--end-author1-- Other Package Authors Github handles: @robinblythe, @agbarnett Repository: https://github.com/RWParsons/predictNMB Version submitted: 0.0.1 Submission type: Stats Badge grade: silver Editor: !--editor-->@adamhsparks<!--end-editor-- Reviewers: @Tinula-kariyawasam, @emitanaka

Due date for @Tinula-kariyawasam: 2023-02-27 Due date for @emitanaka: 2023-02-28

Archive: TBD Version accepted: TBD Language: en

Package: predictNMB
Type: Package
Title: Evaluate Clinical Prediction Models by Net Monetary Benefit
Version: 0.0.1
Authors@R: c(
    person("Rex", "Parsons", role = c("aut", "cre"),
           email = "rex.parsons94@gmail.com",
           comment = c(ORCID = "0000-0002-6053-8174")),
    person("Robin", "Blythe", role = "aut",
           comment = c(ORCID = "0000-0002-3643-4332")),
    person("Adrian", "Barnett", role = "aut",
           comment = c(ORCID = "0000-0001-6339-0374")),
    person("Susanna", "Cramb", role = "ctb",
           comment = c(ORCID = "0000-0001-9041-9531")),
    person("Steven", "McPhail", role = "ctb",
           comment = c(ORCID = "0000-0002-1463-662X"))
  )
Description: Estimates when and where a model-guided treatment strategy may outperform 
    a treat-all or treat-none approach by Monte Carlo simulation and evaluation of the Net Monetary Benefit (NMB).
License: GPL (>= 3)
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.2.1
Roxygen: list (markdown = TRUE, roclets = c ("namespace", "rd", "srr::srr_stats_roclet"))
Imports: 
    assertthat,
    cutpointr,
    dplyr,
    ggplot2,
    magrittr,
    pmsampsize,
    rlang,
    stats,
    tibble,
    tidyr
Suggests: 
    covr,
    flextable,
    knitr,
    parallel,
    pbapply,
    rmarkdown,
    testthat (>= 3.0.0),
    vdiffr,
    withr
Config/testthat/edition: 3
VignetteBuilder: knitr
URL: https://rwparsons.github.io/predictNMB/
BugReports: https://github.com/rwparsons/predictNMB/issues

Scope

Pre-submission Inquiry

General Information

Badging

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

Code of conduct

ropensci-review-bot commented 1 year ago

Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help for help.

ropensci-review-bot commented 1 year ago

:rocket:

Editor check started

:wave:

ropensci-review-bot commented 1 year ago

Oops, something went wrong with our automatic package checks. Our developers have been notified and package checks will appear here as soon as we've resolved the issue. Sorry for any inconvenience.

RWParsons commented 1 year ago

@ropensci-review-bot help

ropensci-review-bot commented 1 year ago

Hello @RWParsons, here are the things you can ask me to do:


# Add an author's response info to the ROpenSci logs
@ropensci-review-bot submit response <AUTHOR_RESPONSE_URL>

# List all available commands
@ropensci-review-bot help

# Show our Code of Conduct
@ropensci-review-bot code of conduct

# Invite the author of a package to the corresponding rOpenSci team. This command should be issued by the author of the package.
@ropensci-review-bot invite me to ropensci/package-name

# Adds package's repo to the rOpenSci team. This command should be issued after approval and transfer of the package.
@ropensci-review-bot finalize transfer of package-name

# Various package checks
@ropensci-review-bot check package
RWParsons commented 1 year ago

@ropensci-review-bot check package

ropensci-review-bot commented 1 year ago

Thanks, about to send the query.

ropensci-review-bot commented 1 year ago

:rocket:

Editor check started

:wave:

ropensci-review-bot commented 1 year ago

Checks for predictNMB (v0.0.1)

git hash: 0aecd6f6

(Checks marked with :eyes: may be optionally addressed.)

Package License: GPL (>= 3)


1. rOpenSci Statistical Standards (srr package)

This package is in the following category:

:heavy_check_mark: All applicable standards [v0.2.0] have been documented in this package (69 complied with; 33 N/A standards)

Click to see the report of author-reported standards compliance of the package with links to associated lines of code, which can be re-generated locally by running the srr_report() function from within a local clone of the repository.


2. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate. |type |package | ncalls| |:----------|:----------|------:| |internal |base | 152| |internal |predictNMB | 28| |internal |utils | 4| |imports |magrittr | 45| |imports |ggplot2 | 17| |imports |stats | 16| |imports |dplyr | 14| |imports |cutpointr | 10| |imports |tibble | 4| |imports |tidyr | 4| |imports |rlang | 2| |imports |pmsampsize | 1| |imports |assertthat | NA| |suggests |parallel | 1| |suggests |covr | NA| |suggests |flextable | NA| |suggests |knitr | NA| |suggests |rmarkdown | NA| |suggests |testthat | NA| |suggests |vdiffr | NA| |suggests |withr | NA| |linking_to |NA | NA| Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats()', and examining the 'external_calls' table.

base

list (31), c (26), lapply (10), do.call (7), t (5), unlist (5), by (4), col (4), data.frame (4), mean (4), abs (3), for (3), length (3), names (3), return (3), seq_len (3), switch (3), cbind (2), class (2), nrow (2), rep (2), sample (2), sqrt (2), all.equal (1), as.character (1), as.factor (1), colMeans (1), diff (1), environment (1), isTRUE (1), log (1), ls (1), mapply (1), max (1), min (1), replicate (1), seq_along (1), sort (1), sum (1), suppressWarnings (1), unique (1), vapply (1)

magrittr

%>% (45)

predictNMB

get_inbuilt_cutpoint (4), get_plot_data (3), cost_threshold (2), do_sample_size_calc (2), f_iteration_wrapper (2), get_row_from_sim (2), get_sample (2), total_nmb (2), add_interval (1), add_sample_size_calcs (1), approx_match (1), do_nmb_iteration (1), do_nmb_sim (1), evaluate_cutpoint (1), fill_fx_names (1), get_nmb_sampler (1), get_thresholds (1)

ggplot2

aes (7), ggplot (2), label_wrap_gen (2), geom_histogram (1), geom_linerange (1), geom_segment (1), ggplot_build (1), position_dodge (1), scale_fill_manual (1)

stats

rnorm (4), binomial (2), family (2), glm (2), median (2), pt (2), predict (1), var (1)

dplyr

mutate (4), left_join (3), row_number (2), summarize (2), across (1), inner_join (1), n (1)

cutpointr

cutpointr (5), maximize_metric (2), auc (1), sensitivity (1), specificity (1)

tibble

as_tibble (2), rownames_to_column (1), tibble (1)

tidyr

expand_grid (2), pivot_longer (2)

utils

data (4)

rlang

sym (2)

parallel

clusterExport (1)

pmsampsize

pmsampsize (1)

**NOTE:** Some imported packages appear to have no associated function calls; please ensure with author that these 'Imports' are listed appropriately.


3. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has: - code in R (100% in 12 files) and - 3 authors - 4 vignettes - no internal data file - 10 imported packages - 14 exported functions (median 27 lines of code) - 44 non-exported functions in R (median 22 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by [the `checks_to_markdown()` function](https://docs.ropensci.org/pkgcheck/reference/checks_to_markdown.html) The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|-----:|----------:|:----------| |files_R | 12| 65.5| | |files_vignettes | 5| 96.9| | |files_tests | 8| 88.2| | |loc_R | 1266| 74.2| | |loc_vignettes | 829| 88.0| | |loc_tests | 563| 77.1| | |num_vignettes | 4| 96.6|TRUE | |n_fns_r | 58| 61.1| | |n_fns_r_exported | 14| 56.3| | |n_fns_r_not_exported | 44| 64.1| | |n_fns_per_file_r | 3| 55.5| | |num_params_per_fn | 6| 84.1| | |loc_per_fn_r | 24| 69.2| | |loc_per_fn_r_exp | 27| 58.8| | |loc_per_fn_r_not_exp | 22| 68.3| | |rel_whitespace_R | 14| 68.3| | |rel_whitespace_vignettes | 30| 88.6| | |rel_whitespace_tests | 21| 75.9| | |doclines_per_fn_exp | 56| 69.3| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 32| 57.0| | ---

3a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package


4. goodpractice and other checks

Details of goodpractice checks (click to open)

#### 3a. Continuous Integration Badges [![R-CMD-check](https://github.com/RWParsons/predictNMB/workflows/R-CMD-check/badge.svg)](https://github.com/rwparsons/predictNMB/actions) [![pkgcheck](https://github.com/RWParsons/predictNMB/workflows/pkgcheck/badge.svg)](https://github.com/rwparsons/predictNMB/actions) **GitHub Workflow Results** | id|name |conclusion |sha | run_number|date | |----------:|:--------------------------|:----------|:------|----------:|:----------| | 3746509530|pages build and deployment |success |5f5226 | 51|2022-12-21 | | 3746435406|pkgcheck |success |0aecd6 | 3|2022-12-21 | | 3746435412|pkgdown |success |0aecd6 | 54|2022-12-21 | | 3746435410|R-CMD-check |success |0aecd6 | 117|2022-12-21 | | 3746435408|test-coverage |success |0aecd6 | 16|2022-12-21 | --- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) rcmdcheck found no errors, warnings, or notes #### Test coverage with [covr](https://covr.r-lib.org/) Package coverage: 91.41 #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) The following functions have cyclocomplexity >= 15: function | cyclocomplexity --- | --- get_inbuilt_cutpoint | 25 plot.predictNMBscreen | 20 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found the following 90 potential issues: message | number of times --- | --- Avoid library() and require() calls in packages | 11 Lines should not be more than 80 characters. | 79


5. Other Checks

Details of other checks (click to open)

:heavy_multiplication_x: The following 3 function names are duplicated in other packages: - - `get_sample` from funModeling, openeo, triplot - - `get_thresholds` from brainGraph, healthcareai - - `make_summary_table` from dglars


Package Versions

|package |version | |:--------|:---------| |pkgstats |0.1.3 | |pkgcheck |0.1.0.32 | |srr |0.0.1.186 |


Editor-in-Chief Instructions:

This package is in top shape and may be passed on to a handling editor

annakrystalli commented 1 year ago

Hello again @RWParsons ,

Just a note that most editors are on a holiday break. We will pick up assigning a handling editor as soon as everyone is back in early January. Have a nice break! ✨

RWParsons commented 1 year ago

No worries at all - thanks for the update @annakrystalli 👌!

RWParsons commented 1 year ago

@annakrystalli I just snuck in a quick change to the package to add progress bars which I had been meaning to do before submission but forgotted about until yesterday - I've updated the description in the OP of this issue and will re-run the bot to check the package. Hope this is all fine - thanks!

RWParsons commented 1 year ago

@ropensci-review-bot check package

ropensci-review-bot commented 1 year ago

Thanks, about to send the query.

ropensci-review-bot commented 1 year ago

:rocket:

Editor check started

:wave:

ropensci-review-bot commented 1 year ago

Checks for predictNMB (v0.0.1)

git hash: 2fef46dd

(Checks marked with :eyes: may be optionally addressed.)

Package License: GPL (>= 3)


1. rOpenSci Statistical Standards (srr package)

This package is in the following category:

:heavy_check_mark: All applicable standards [v0.2.0] have been documented in this package (69 complied with; 33 N/A standards)

Click to see the report of author-reported standards compliance of the package with links to associated lines of code, which can be re-generated locally by running the srr_report() function from within a local clone of the repository.


2. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate. |type |package | ncalls| |:----------|:----------|------:| |internal |base | 152| |internal |predictNMB | 28| |internal |utils | 4| |imports |magrittr | 45| |imports |ggplot2 | 17| |imports |stats | 16| |imports |dplyr | 14| |imports |cutpointr | 10| |imports |tibble | 4| |imports |tidyr | 4| |imports |rlang | 2| |imports |pmsampsize | 1| |imports |assertthat | NA| |suggests |parallel | 1| |suggests |pbapply | 1| |suggests |covr | NA| |suggests |flextable | NA| |suggests |knitr | NA| |suggests |rmarkdown | NA| |suggests |testthat | NA| |suggests |vdiffr | NA| |suggests |withr | NA| |linking_to |NA | NA| Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats()', and examining the 'external_calls' table.

base

list (31), c (26), lapply (9), do.call (7), t (5), unlist (5), by (4), col (4), data.frame (4), mean (4), seq_len (4), abs (3), for (3), length (3), names (3), return (3), switch (3), cbind (2), class (2), nrow (2), rep (2), sample (2), sqrt (2), all.equal (1), as.character (1), as.factor (1), colMeans (1), diff (1), environment (1), isTRUE (1), log (1), ls (1), mapply (1), max (1), min (1), replicate (1), seq_along (1), sort (1), sum (1), suppressWarnings (1), unique (1), vapply (1)

magrittr

%>% (45)

predictNMB

get_inbuilt_cutpoint (4), get_plot_data (3), cost_threshold (2), do_sample_size_calc (2), f_iteration_wrapper (2), get_row_from_sim (2), get_sample (2), total_nmb (2), add_interval (1), add_sample_size_calcs (1), approx_match (1), do_nmb_iteration (1), do_nmb_sim (1), evaluate_cutpoint (1), fill_fx_names (1), get_nmb_sampler (1), get_thresholds (1)

ggplot2

aes (7), ggplot (2), label_wrap_gen (2), geom_histogram (1), geom_linerange (1), geom_segment (1), ggplot_build (1), position_dodge (1), scale_fill_manual (1)

stats

rnorm (4), binomial (2), family (2), glm (2), median (2), pt (2), predict (1), var (1)

dplyr

mutate (4), left_join (3), row_number (2), summarize (2), across (1), inner_join (1), n (1)

cutpointr

cutpointr (5), maximize_metric (2), auc (1), sensitivity (1), specificity (1)

tibble

as_tibble (2), rownames_to_column (1), tibble (1)

tidyr

expand_grid (2), pivot_longer (2)

utils

data (4)

rlang

sym (2)

parallel

clusterExport (1)

pbapply

pblapply (1)

pmsampsize

pmsampsize (1)

**NOTE:** Some imported packages appear to have no associated function calls; please ensure with author that these 'Imports' are listed appropriately.


3. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has: - code in R (100% in 12 files) and - 3 authors - 4 vignettes - no internal data file - 10 imported packages - 14 exported functions (median 27 lines of code) - 44 non-exported functions in R (median 22 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by [the `checks_to_markdown()` function](https://docs.ropensci.org/pkgcheck/reference/checks_to_markdown.html) The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|-----:|----------:|:----------| |files_R | 12| 65.5| | |files_vignettes | 5| 96.9| | |files_tests | 8| 88.2| | |loc_R | 1304| 74.8| | |loc_vignettes | 831| 88.0| | |loc_tests | 591| 78.1| | |num_vignettes | 4| 96.6|TRUE | |n_fns_r | 58| 61.1| | |n_fns_r_exported | 14| 56.3| | |n_fns_r_not_exported | 44| 64.1| | |n_fns_per_file_r | 3| 55.5| | |num_params_per_fn | 6| 84.1| | |loc_per_fn_r | 24| 69.2| | |loc_per_fn_r_exp | 27| 58.8| | |loc_per_fn_r_not_exp | 22| 68.3| | |rel_whitespace_R | 14| 68.5| | |rel_whitespace_vignettes | 30| 88.6| | |rel_whitespace_tests | 22| 77.7| | |doclines_per_fn_exp | 56| 69.3| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 32| 57.0| | ---

3a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package


4. goodpractice and other checks

Details of goodpractice checks (click to open)

#### 3a. Continuous Integration Badges [![R-CMD-check](https://github.com/RWParsons/predictNMB/workflows/R-CMD-check/badge.svg)](https://github.com/rwparsons/predictNMB/actions) [![pkgcheck](https://github.com/RWParsons/predictNMB/workflows/pkgcheck/badge.svg)](https://github.com/rwparsons/predictNMB/actions) **GitHub Workflow Results** | id|name |conclusion |sha | run_number|date | |----------:|:--------------------------|:----------|:------|----------:|:----------| | 3835432164|pages build and deployment |success |7bf6a7 | 53|2023-01-04 | | 3835300178|pkgcheck |success |2fef46 | 5|2023-01-04 | | 3835300175|pkgdown |success |2fef46 | 56|2023-01-04 | | 3835300177|R-CMD-check |success |2fef46 | 119|2023-01-04 | | 3835300174|test-coverage |success |2fef46 | 18|2023-01-04 | --- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) rcmdcheck found no errors, warnings, or notes #### Test coverage with [covr](https://covr.r-lib.org/) Package coverage: 90.64 #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) The following functions have cyclocomplexity >= 15: function | cyclocomplexity --- | --- get_inbuilt_cutpoint | 25 plot.predictNMBscreen | 20 screen_simulation_inputs | 16 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found the following 90 potential issues: message | number of times --- | --- Avoid library() and require() calls in packages | 11 Lines should not be more than 80 characters. | 79


5. Other Checks

Details of other checks (click to open)

:heavy_multiplication_x: The following 3 function names are duplicated in other packages: - - `get_sample` from funModeling, openeo, triplot - - `get_thresholds` from brainGraph, healthcareai - - `make_summary_table` from dglars


Package Versions

|package |version | |:--------|:---------| |pkgstats |0.1.3 | |pkgcheck |0.1.0.32 | |srr |0.0.1.186 |


Editor-in-Chief Instructions:

This package is in top shape and may be passed on to a handling editor

annakrystalli commented 1 year ago

@ropensci-review-bot check srr

ropensci-review-bot commented 1 year ago

'srr' standards compliance:

:heavy_check_mark: This package complies with > 50% of all standads and may be submitted.

annakrystalli commented 1 year ago

Thank you @RWParsons for the updates.

We are now looking for a handling editor 👍

maurolepore commented 1 year ago

Dear @RWParsons,

Today starts my rotation as EiC so the role of @annakrystalli is now mine. I've watched the conversations here and among editors and I see we're still looking for a handling editor. Thanks for your patience.

RWParsons commented 1 year ago

Hi @maurolepore,

No worries - please let me know if I can to do anything to help progress this.

maurolepore commented 1 year ago

@ropensci-review-bot assign @adamhsparks as editor

ropensci-review-bot commented 1 year ago

Assigned! @adamhsparks is now the editor

adamhsparks commented 1 year ago

Hi @RWParsons, I'll be your handling editor for this submission. This is my first stats package submission, so please bear with me, but I think we'll be alright. ☺️

I've performed the editor checks on your package and have a few items that should be addressed before the reviewers start their work.

Editor checks:

Editor comments

Vignettes and documentation
Package integrity

Omitted ‘LazyData’ from DESCRIPTION

NB: this package now depends on R (>= 3.5.0) WARNING: Added dependency on R >= 3.5.0 because serialized objects in serialize/load version 3 cannot be read in older versions of R. File(s) containing such objects: ‘predictNMB/tests/testthat/fixtures/predictNMBscreen_object.rds’ ‘predictNMB/tests/testthat/fixtures/predictNMBsim_object.rds’

Tests
{goodpractice} checks
Spell check

spellcheck::spell_check() returns several hits. Please set up spell-checking and specify the language in the DESCRIPTION file. You can do this with the {spelling} package. As a fellow Aussie, I recognise many of these are not misspelled, but {spelling} will allow you to build a wordlist to accept them and not flag in future checks.

DESCRIPTION does not contain 'Language' field. Defaulting to 'en-US'.
  WORD               FOUND IN
AUD                detailed-example.Rmd:36,37
behaviour          summarising-results-with-predictNMB.Rmd:250
categorised        introduction-to-predictNMB.Rmd:63
centre             introduction-to-predictNMB.Rmd:117
CMD                README.md:8
                   README.Rmd:28
Codecov            README.md:12
                   README.Rmd:30
colour             plot.predictNMBsim.Rd:49,50
Colour             plot.predictNMBsim.Rd:59
coloured           plot.predictNMBscreen.Rd:60
                   plot.predictNMBsim.Rd:37
colours            plot.predictNMBsim.Rd:48
                   summarising-results-with-predictNMB.Rmd:221
FN                 do_nmb_sim.Rd:57
                   creating-nmb-functions.Rmd:24,67
                   introduction-to-predictNMB.Rmd:40,42,63
FNs                creating-nmb-functions.Rmd:121
FP                 do_nmb_sim.Rd:57
                   creating-nmb-functions.Rmd:23,65
                   introduction-to-predictNMB.Rmd:39,42,63
frac               creating-nmb-functions.Rmd:64
grey               summarising-results-with-predictNMB.Rmd:221
inb                make_summary_table.predictNMBscreen.Rd:26,54
                   make_summary_table.predictNMBsim.Rd:25,50
                   make_summary_table.Rd:26,55
                   plot.predictNMBscreen.Rd:38
                   plot.predictNMBsim.Rd:29
INB                introduction-to-predictNMB.Rmd:133
                   summarising-results-with-predictNMB.Rmd:110
magrittr           pipe.Rd:10,12
maximise           creating-nmb-functions.Rmd:32
                   detailed-example.Rmd:32
                   introduction-to-predictNMB.Rmd:25
maximises          do_nmb_sim.Rd:41
                   introduction-to-predictNMB.Rmd:160
maximising         introduction-to-predictNMB.Rmd:194
minimisation       introduction-to-predictNMB.Rmd:123
minimises          do_nmb_sim.Rd:44
minimising         do_nmb_sim.Rd:44
                   screen_simulation_inputs.Rd:53
                   detailed-example.Rmd:95
                   introduction-to-predictNMB.Rmd:143
modelling          creating-nmb-functions.Rmd:35,127,285
nmb                make_summary_table.predictNMBscreen.Rd:26,27,53
                   make_summary_table.predictNMBsim.Rd:25,26,49
                   make_summary_table.Rd:26,27,54
                   plot.predictNMBscreen.Rd:38,39
                   plot.predictNMBsim.Rd:29,30
NMB                do_nmb_sim.Rd:41,56,62,66,98
                   evaluate_cutpoint.Rd:5,16,19,23
                   get_inbuilt_cutpoint.Rd:20
                   get_nmb_sampler.Rd:5,59
                   get_thresholds.Rd:5,14,24
                   make_summary_table.predictNMBscreen.Rd:53,54
                   make_summary_table.predictNMBsim.Rd:49,50
                   make_summary_table.Rd:54,55
                   screen_simulation_inputs.Rd:42,46
                   description:2
                   README.md:21,30,48,73,128,171
                   README.Rmd:37,41,55,75,126,146
                   creating-nmb-functions.Rmd:2,17,19,32,39,60,121,123,150,187,207,208,231,245
                   detailed-example.Rmd:47,68,95,99
                   introduction-to-predictNMB.Rmd:23,25,35,61,65,83,104,117,145,160
                   summarising-results-with-predictNMB.Rmd:85,88,99,110
optimisation       introduction-to-predictNMB.Rmd:123
optimising         do_nmb_sim.Rd:41
                   screen_simulation_inputs.Rd:53
                   detailed-example.Rmd:95,123,178
                   introduction-to-predictNMB.Rmd:143
                   summarising-results-with-predictNMB.Rmd:221
ORCID              predictNMB-package.Rd:23,27,28,33,34
performant         README.md:21
                   README.Rmd:37
pkgcheck           README.md:16
                   README.Rmd:32
predictNMBscreen   print.predictNMBscreen.Rd:5,18
predictNMBsim      print.predictNMBsim.Rd:5,18
PSA                creating-nmb-functions.Rmd:69
QALY               creating-nmb-functions.Rmd:212
QALYs              get_nmb_sampler.Rd:28
                   creating-nmb-functions.Rmd:204,210,212
                   detailed-example.Rmd:32
randomised         detailed-example.Rmd:28
roc                do_nmb_sim.Rd:48
stabilised         summarising-results-with-predictNMB.Rmd:221
summarise          make_summary_table.predictNMBscreen.Rd:26
                   make_summary_table.predictNMBsim.Rd:25
                   make_summary_table.Rd:26
                   plot.predictNMBscreen.Rd:38
                   plot.predictNMBsim.Rd:29
                   summarising-results-with-predictNMB.Rmd:26
summarising        plot.predictNMBscreen.Rd:36
                   introduction-to-predictNMB.Rmd:125
Summarising        README.md:167
                   README.Rmd:144
                   summarising-results-with-predictNMB.Rmd:2
TP                 do_nmb_sim.Rd:57
                   creating-nmb-functions.Rmd:21,64
                   introduction-to-predictNMB.Rmd:37,42
unintuitive        creating-nmb-functions.Rmd:44
visualise          plot.predictNMBscreen.Rd:82
                   plot.predictNMBsim.Rd:79
                   README.md:95
                   README.Rmd:104
                   introduction-to-predictNMB.Rmd:117,133
                   summarising-results-with-predictNMB.Rmd:26,85,88,99,110
visualised         introduction-to-predictNMB.Rmd:117
visualising        README.md:111
                   README.Rmd:112
Whitty             detailed-example.Rmd:180
WTP                creating-nmb-functions.Rmd:204,212
                   detailed-example.Rmd:32
youden             do_nmb_sim.Rd:42
                   detailed-example.Rmd:123
Youden             do_nmb_sim.Rd:42
                   creating-nmb-functions.Rmd:231
                   detailed-example.Rmd:95,97,144,178
                   introduction-to-predictNMB.Rmd:160,194
Metadata

In the README, there is an incomplete thought on line 79, "cl: (Optional) users can pass a cluster as the. If it is passed, the simulations are run in parallel (faster).", emphasis mine.

Admittedly, this isn't my area of research, but I think that prospective users would benefit from a description of what the fields are in fx_nmb() and that get_nmb_sampler() returns a function, not an object as this is unusual behaviour in R. The introductory vignette does cover this, so maybe just a pointer to that section of the document is sufficient for the uninitiated (like me).

There is no "CONTRIBUTING.md", please add one.

Other comments

And lastly, RWParsons, could you please add the rOpenSci under review badge to your README?

[![](https://badges.ropensci.org/566_status.svg)](https://github.com/ropensci/software-review/issues/566)

I'm looking for reviewers now.


Reviewers: Due date:


RWParsons commented 1 year ago

Hi @adamhsparks,

This is my first ropensci submission so I'm bound to make mistakes here too!

Thanks for the review - I'll be working on them this week.

I have a CONTRIBUTING.md file here (https://github.com/RWParsons/predictNMB/blob/master/.github/CONTRIBUTING.md) Is this in the wrong place?

Thanks!

adamhsparks commented 1 year ago

{spelling} will let you update the wordlist to include any words you know are correct. In the past I have used "EN-AU" but it's occasionally not worked properly for me, so I just update the wordlist and leave the spelling as "EN-US" and use {spelling} to ensure I didn't introduce any new spelling errors after confirming what I wanted for Australian spelling, names, etc.

I missed the CONTRIBUTING.md, sorry. It is in the right spot. Just me getting ahead of myself. :)

adamhsparks commented 1 year ago

@ropensci-review-bot seeking reviewers

ropensci-review-bot commented 1 year ago

Please add this badge to the README of your package repository:

[![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/566_status.svg)](https://github.com/ropensci/software-review/issues/566)

Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news

adamhsparks commented 1 year ago

@ropensci-review-bot add @Tinula-kariyawasam to reviewers

ropensci-review-bot commented 1 year ago

@Tinula-kariyawasam added to the reviewers list. Review due date is 2023-02-27. Thanks @Tinula-kariyawasam for accepting to review! Please refer to our reviewer guide.

rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.

ropensci-review-bot commented 1 year ago

@Tinula-kariyawasam: If you haven't done so, please fill this form for us to update our reviewers records.

adamhsparks commented 1 year ago

@ropensci-review-bot add @emitanaka to reviewers

ropensci-review-bot commented 1 year ago

@emitanaka added to the reviewers list. Review due date is 2023-02-28. Thanks @emitanaka for accepting to review! Please refer to our reviewer guide.

rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.

ropensci-review-bot commented 1 year ago

@emitanaka: If you haven't done so, please fill this form for us to update our reviewers records.

ropensci-review-bot commented 1 year ago

:calendar: @Tinula-kariyawasam you have 2 days left before the due date for your review (2023-02-27).

ropensci-review-bot commented 1 year ago

:calendar: @emitanaka you have 2 days left before the due date for your review (2023-02-28).

emitanaka commented 1 year ago

predictNMB

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Functionality

Estimated hours spent reviewing: 4.5


Review Comments

Hi @RWParsons and other authors,

This is a neat package that I imagine will be helpful in decision making for clinical scenarios! The following are points to address some of the missing components from the rOpenSci guideline and some suggestions to hopefully make the package better. Some small typos/grammar fixes were put in as a pull request.

Some performance checks for ones I know the correct value:

# should be roughly 0.7
yardstick::roc_auc(get_sample(0.7, 10000, 0.1), factor(1 - actual), x)
# should be roughly 0.9 and 0.1
table(get_sample(0.7, 10000, 0.1)$actual)/10000

I am new to reviewing software so apologies if something in this review is not quite right or missing content I should have included.

adamhsparks commented 1 year ago

Thanks, @emitanaka, for your thoughtful and thorough review.

There is a CONTRIBUTING.md file, it's in the .github directory. I had the same comment. I'm unsure why I missed it or what I thought I was looking for, interesting that you had the same comment though. Also, it does have the proper URL and BugReports fields in DESCRIPTION.

@Tinula-kariyawasam, are you able to submit your review soon or do you require some more time?

RWParsons commented 1 year ago

Thanks, @emitanaka for the thorough review and useful feedback - it's definitely improved the quality of the package and taught me a bit more about best practice!

I have implemented all your suggested changes except for combining the what and rename_vector for the make_summary_table() (now changed to summary()). For this method (and same for the plot() - now autoplot() - method), the what pertains to the value being presented (selected cutpoint/NMB/INB) whereas the rename_vector corresponds to the cutpoint selection method. Maybe I'm misunderstanding your point and there's a way to combine these that I'm not thinking of though!

On the extra_theme comment, I was adding this to the graphic but also ... + theme_bw(). I have made the change to move the extra_theme content to theme_sim() and shown how to add this in the documentation and all examples in the README/vignettes. Do you think that I should also avoid applying theme_bw() to the graphics being returned (or at least add an argument for controlling this behaviour)?

I do have a performance test for get_sample() but it was within a poorly named test file. I have renamed the test file to be test-get_sample.R and this assesses the AUC of the resulting model compared to the specified value when calling get_sample().

Tinula-kariyawasam commented 1 year ago

Thanks, @emitanaka, for your thoughtful and thorough review.

There is a CONTRIBUTING.md file, it's in the .github directory. I had the same comment. I'm unsure why I missed it or what I thought I was looking for, interesting that you had the same comment though. Also, it does have the proper URL and BugReports fields in DESCRIPTION.

@Tinula-kariyawasam, are you able to submit your review soon or do you require some more time?

Hi @adamhsparks, just require a bit more time to finish my review. I shall have my review submitted by the 24th March. Thanks for the understanding

adamhsparks commented 1 year ago

Thanks, @Tinula-kariyawasam

Tinula-kariyawasam commented 1 year ago

Hi @RWParsons , here is my review:

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Functionality

Estimated hours spent reviewing: 12


Review Comments

Congratulations on this great package. It is a really well-thought-out project, and no barring faults or issues are evident at all.

It works really well, is simple and easy to understand.

Thoroughly checks through all edge cases.

Two areas of improvement would entirely satisfy me on this package:

Overall there are no significant changes that I see need to be made to the package, and all tests and examples are working well. Just from a perspective of how we can improve the computational efficiency of this package, some changes need to be made.

adamhsparks commented 1 year ago

@Tinula-kariyawasam, thank you for that detailed review!

@RWParsons, please let me know what assistance you may need.

RWParsons commented 1 year ago

Thanks for the review and helpful suggestions, @Tinula-kariyawasam !

Where would you like more use-cases and examples? Do you mean in addition to the vignettes or could any particular areas of the vignettes be expanded upon more?

I agree that there's definitely some areas where computational speed could be improved and would be happy to improve things (almost) everywhere you suggest. Could you please share with me the edited version of the package that you have on your device?

RWParsons commented 1 year ago

Hi @Tinula-kariyawasam, I've just been testing out some of your suggestions and working on some other areas where computation efficiency could be improved now that you've brought my attention there.

Try to use vectorisation in your functions as much as possible. For example, instead of using a loop to generate simulated data for df_nmb_training and df_nmb_evaluation, you could use replicate() and do.call() to generate the data in a more vectorised manner. There are many occurrences of the 'lapply' function being utilised. A specific example is from the screen_simulation_inputs where the function used in the simulations loop can be replaced with a vectorised operations (e.g purrr:map(seq_len(nrow(input_grid)), ~ {...})))

I know there are a lot of lapply()'s about in the code. I just tested swapping the one that's used most heavily (lapply(seq_len(n_sims), f_iteration_wrapper) from do_nmb_sim.R) and had almost no performance gain when swapping it for replicate(), but did introduce the problem of not being able to pass the iteration number so that the df_nmb_training is indexed appropriately. I think the use of lapply() in screen_simulation_inputs() are pretty inconsequential since they are only used for manipulating the grid of inputs to do_nmb_sim() which is the function that does all the heavy lifting.

Parallelisation: The lapply() function can be parallelised using the furrr package. By default, lapply() runs sequentially, but furrr allows us to run the loop in parallel on multiple cores, which can significantly speed up the computation time. We can parallelise the loop using future_map() or future_pmap(). In the screen_simulation_inputs function, for example future_map(input_grid, ~ do_nmb_sim(...)) will run the function do_nmb_sim() on each row of input_grid in parallel.

I agree that parallelisation is beneficial here! I had initially done almost exactly as you had suggested - mapping the input_grid to do_nmb_sim() in parallel. However, I found that this was problematic when the input grid was small and the number of workers available much larger (which I think is likely going to be the case for most users). So, instead, I moved the parallelisation to the simulation-iteration level within do_nmb_sim(). I'm not currently using {furrr} but using parallel::parlapply() instead to keep the dependencies down. I just today tested using furrr::future_map() and had no performance gain compared to parlapply. I acknowledge it may have been a better choice earlier on since I wouldn't have to do the cluster export etc but now that it's set up, I'm reluctant to revert for little performance gain and an additional dependency.

Avoiding unnecessary computations: In several places in the code, unnecessary computations are performed. For example, in the screen_meta section at the end of screen simulations function, the length of each input parameter is checked to determine if it varies across the screen. Instead of doing this check, we can store a list of the varying parameters at the beginning of the function and use it later to construct the screen_meta list. This will avoid unnecessary computations and improve the speed of the function. Another example is from the do_nmb_sim, the df_nmb_training and df_nmb_evaluation data frames are generated for each simulation, even though they are the same for each simulation. You could generate these data frames once outside the main loop and reuse them for each simulation.

I agree that the code to create the screen_meta object is a bit clunky. I have cleaned this up now so it is a lot cleaner and probably a bit faster although I think this bit of code is likely not affecting total runtime since it's only ever run once.

Use of data.table instead of dplyr: If we are dealing with small data sets (in the ranges of hundreds or thousands) then the literature shows that the difference between using data.table and dplyr is minuscule, and it is really a preference of the user. However, to further increase a package's computational speed, I recommend taking advantage of data.table's efficiency. Another note where we can utilise data.table is where there are a few places in the code where the tidyr::expand_grid() function is used to create a data frame with all possible combinations of input values. This can be slow for large datasets. Instead, we can use the CJ() function from the data.table package to create a Cartesian join of the input values. For example, CJ(sample_size, sim_auc, event_rate, min_events, meet_min_events) will create a data frame with all possible combinations of sample_size, sim_auc, event_rate, min_events, and meet_min_events.

The only times expand.grid() is used is on the simulation input grid, and not on the iteration level of the simulations, so likely only < 50 rows for most users. Also, I intentionally use a tibble and tidyr::expand_grid() because it allows me to have a column that contains functions with minimal fuss (the fx_nmb_training and fx_nmb_evaluation). I appreciate the tip for the Cartesian join with data.table but unfortunately I don't think it's going to lead to a large performance gain overall for users due to it's use on only ever small datasets (and never on a per-simulation iteration level).

In terms of other ways to use data.table... currently, almost all the compute time is spent mapping the iteration sequence to f_iteration_wrapper(). This is all done either using a lapply(), parlapply(), or pblapply() (if done in parallel). These return a list of outputs and are then transformed back into matrixes and data.frames but that extra transformation is extremely fast comparatively. I'm not sure if there is a way to take advantage of {data.table} to speed up this section up. If not, I'm hesitant to add the extra dependency.

The if (inherits()) statements can be replaced with if (is.function()) to check if the arguments are functions. This is because inherits() checks if an object inherits from a class, which can be slower than checking if an object is a function.

I have updated throughout the package to avoid using inherits() for functions and have changed them to is.function(). Thanks!

Lastly, Using appropriate data structures can also improve the efficiency of the code. For example, instead of using a data frame to store the results for each simulation, you could use a matrix or an array to reduce memory usage and improve performance.

Currently, the df_result and df_thresholds objects are created (when running the simulations) as matrix objects and only at the end transformed into data.frames. Today I tested having get_sample() return a matrix rather than a data.frame() but found no performance benefit. The speed gain in running get_sample() was lost due to the additional time required to access columns from the matrix in do_nmb_sim(). (This was surprising to me but I read more about why here - https://stackoverflow.com/questions/54005918/extract-column-from-data-frame-faster-than-from-matrix-why)

Many of example are on the do_nmb_sim and screen_simulation_inputs functions are they your two main ones.

Correct - these are the two main functions that I expect users to interact with as they abstract away all the efforts of doing the simulation study. get_nmb_sampler() may be helpful to users to create the inputs for these functions (fx_nmb_training and evaluation) and all other functions are largely internal.

Another change I made to try to improve performance was swapping out stats::glm() for stats::glm.fit() and getting the predicted probabilties from the estimated coefficients a bit more directly, rather than using predict.glm(). This code is run on each iteration of the simulation and had a 15% speed up on my machine for the model fitting and getting predicted probabilities section within do_nmb_iteration(). I appreciate you bringing my attention towards efficiency gains and testing this!

EDIT: I've just reverted the change to use stats::glm.fit() since it was giving me a bug when the model coefficients were extreme (giving me NA predicted probabilities that would then throw off the cutpoint calcs etc). It seems that stats::glm.predict() includes some nice NA handling with that extra compute time 👍

adamhsparks commented 1 year ago

Sorry, I forgot when @Tinula-kariyawasam submitted his review to increment this issue!

adamhsparks commented 1 year ago

@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/566#issuecomment-1467178633 time 12

ropensci-review-bot commented 1 year ago

Logged review for Tinula-kariyawasam (hours: 12)

adamhsparks commented 1 year ago

@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/566#issuecomment-1447725606 time 4.5

ropensci-review-bot commented 1 year ago

Logged review for emitanaka (hours: 4.5)