Pre-submission Inquiry: aorsf; accelerated oblique random survival forests

bcjaeger commented 2 years ago

Submitting Author Name: Byron C. Jaeger Submitting Author Github Handle: !--author1-->@bcjaeger<!--end-author1-- Other Package Authors Github handles: (comma separated, delete if none) @nmpieyeskey, @sawyerWeld Repository: https://github.com/bcjaeger/aorsf Submission type: Pre-submission Language: en

Paste the full DESCRIPTION file inside a code block below:

Package: aorsf
Title: Accelerated Oblique Random Survival Forests
Version: 0.0.0.9000
Authors@R: c(
    person(given = "Byron",
           family = "Jaeger",
           role = c("aut", "cre"),
           email = "bjaeger@wakehealth.edu",
           comment = c(ORCID = "0000-0001-7399-2299")),
    person(given = "Nicholas",  family = "Pajewski", role = "ctb"),
    person(given = "Sawyer", family = "Welden", role = "ctb", email = "swelden@wakehealth.edu")
    )
Description: Fit, interpret, and make predictions with oblique random
    survival forests. Oblique decision trees are notoriously slow compared
    to their axis based counterparts, but 'aorsf' runs as fast or faster than 
    axis-based decision tree algorithms for right-censored time-to-event 
    outcomes.
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE, roclets = c ("namespace", "rd", "srr::srr_stats_roclet"))
RoxygenNote: 7.1.2
LinkingTo: 
    Rcpp,
    RcppArmadillo
Imports: 
    table.glue,
    Rcpp,
    data.table
URL: https://github.com/bcjaeger/aorsf,
    https://bcjaeger.github.io/aorsf
BugReports: https://github.com/bcjaeger/aorsf/issues
Depends: 
    R (>= 3.6)
Suggests: 
    survival,
    survivalROC,
    ggplot2,
    testthat (>= 3.0.0),
    knitr,
    rmarkdown,
    glmnet,
    covr,
    units
Config/testthat/edition: 3
VignetteBuilder: knitr

Scope

Please indicate which category or categories from our package fit policies or statistical package categories this package falls under. (Please check an appropriate box below):

Data Lifecycle Packages
- [ ] data retrieval
- [ ] data extraction
- [ ] database access
- [ ] data munging
- [ ] data deposition
- [ ] workflow automation
- [ ] version control
- [ ] citation management and bibliometrics
- [ ] scientific software wrappers
- [ ] database software bindings
- [ ] geospatial data
- [ ] text data
  
  Statistical Packages
- [ ] Bayesian and Monte Carlo Routines
- [ ] Dimensionality Reduction, Clustering, and Unsupervised Learning
- [X] Machine Learning
- [ ] Regression and Supervised Learning
- [ ] Exploratory Data Analysis (EDA) and Summary Statistics
- [ ] Spatial Analyses
- [ ] Time Series Analyses
Explain how and why the package falls under these categories (briefly, 1-2 sentences). Please note any areas you are unsure of:

Random forests are a machine learning algorithm and this package provides optimized code to fit a specific type of random forest. I am unsure about whether this belongs in the machine learning category or the regression and supervised learning category. I am uncertain about whether aorsf belongs in regression and supervised learning because random forests are definitely used for supervised learning but they don't really fit into a 'regression' framework.

If submitting a statistical package, have you already incorporated documentation of standards into your code via the srr package?

Yes

Who is the target audience and what are scientific applications of this package?

Target audience: people who want to develop or interpret a risk prediction model, i.e., a prediction model for right-censored time-to-event outcomes.

Applications: fit an oblique random survival forest, compute predicted risk at a given time, estimate the importance of individual variables, and compute partial dependence to depict relationships between specific predictors and predicted risk.

Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?

The obliqueRSF package precedes aorsf. The aorsf package runs hundreds of times faster than obliqueRSF and includes novel features for interpretation of the oblique random survival forest (negation importance and ANOVA importance). I developed both packages.

(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research?

Yes.

Any other questions or issues we should be aware of?:

None.

bcjaeger commented 2 years ago

Hi, @jooolia. Thanks for helping me get this pre-submission started. Is there anything I can do to help with the pre-submission tasks?

jooolia commented 2 years ago

Hi @bcjaeger, Thanks for your patience. It seems that the machine learning category would be a good fit for your package and the package appears in good shape to make a full submission when you would like.

Regarding the categories, @mpadge may have a bit more to add about this.

Thanks, Julia

bcjaeger commented 2 years ago

Thank you!

mpadge commented 2 years ago

Thanks for your submission @bcjaeger Our statistical standards are a work-in-progress. Please help to improve them by providing feedback, particularly on appropriateness or otherwise of any particular standard. That can be done informally via GitHub discussions, or more formally via a pull request to that same repo (standards are here). We're also very keen to develop policies on handling cases of potentially ambiguous categories, such as yours. To help that process, I've started this discussion thread - please offer any insight you can. Thanks!

jooolia commented 2 years ago

Great thanks @mpadge.

I will close this issue and we look forward to the full submission. Thanks, Julia

ropensci / software-review

Pre-submission Inquiry: aorsf; accelerated oblique random survival forests #525

Scope