Probabilistic Supervised Learning for mlr3 (website).
mlr3proba
is a machine learning toolkit for making probabilistic
predictions within the mlr3
ecosystem. It currently supports the following tasks:
The survival analysis part is considered in a mature state, the rest are in early stages of development.
Key features of mlr3proba
focus on survival analysis and are:
TaskSurv
)train
/predict
model interface to any probabilistic
predictive model (frequentist, Bayesian, Deep Learning, or other)mlr3proba
is not currently on CRAN. Please follow one of the two
following methods to install it:
install.packages("mlr3proba", repos = "https://mlr-org.r-universe.dev")
Or for easier installation going forward:
usethis::edit_r_environ()
then in the file that opened add or
edit options
to look something like:options(repos = c(
raphaels1 = "https://raphaels1.r-universe.dev",
mlrorg = "https://mlr-org.r-universe.dev", # add this line
CRAN = "https://cloud.r-project.org"
))
R
sessioninstall.packages("mlr3proba")
as usualInstall the latest development version:
remotes::install_github("mlr-org/mlr3proba")
mlr3proba
and include the Kaplan-Meier Estimator,
the Cox Proportional Hazards model and the Survival Tree learner.For density estimation and probabilistic regression only the log-loss is currently implemented. For survival analysis, see full list here.
Some commonly used measures are the following:
ID | Measure | Package | Category | Prediction Type |
---|---|---|---|---|
surv.dcalib | D-Calibration | mlr3proba | Calibration | distr |
surv.cindex | Concordance Index | mlr3proba | Discrimination | crank |
surv.uno_auc | Uno’s AUC | survAUC | Discrimination | lp |
surv.graf | Integrated Brier Score | mlr3proba | Scoring Rule | distr |
surv.rcll | Right-Censored Log loss | mlr3proba | Scoring Rule | distr |
surv.intlogloss | Integrated Log-Likelihood | mlr3proba | Scoring Rule | distr |
mlr3proba is a free and open source software project that encourages participation and feedback. If you have any issues, questions, suggestions or feedback, please do not hesitate to open an “issue” about it on the GitHub page!
In case of problems / bugs, it is often helpful if you provide a “minimum working example” using reprex that showcases the behavior.
Predecessors to this package are previous instances of survival
modelling in mlr. The
skpro package in
the python/scikit-learn ecosystem follows a similar interface for
probabilistic supervised learning and is an architectural predecessor.
Several packages exist which allow probabilistic predictive modelling
with a Bayesian model specific general interface, such as
rjags and
stan. For implementation of a
few survival models and measures, a central package is
survival. There does not
appear to be a package that provides an architectural framework for
distribution/density estimation, see this
list for a
review of density estimation packages in R
.
Several people contributed to the building of mlr3proba
. Firstly,
thanks to Michel Lang for writing mlr3survival
. Several learners and
measures implemented in mlr3proba
, as well as the prediction, task,
and measure surv objects, were written initially in mlr3survival
before being absorbed into mlr3proba
. Secondly thanks to Franz Kiraly
for major contributions towards the design of the proba-specific parts
of the package, including compositors and predict types. Also for
mathematical contributions towards the scoring rules implemented in the
package. Finally thanks to Bernd Bischl and the rest of the mlr core
team for building mlr3
and for many conversations about the design of
mlr3proba
.
If you use mlr3proba, please cite our Bioinformatics article:
@Article{,
title = {mlr3proba: An R Package for Machine Learning in Survival Analysis},
author = {Raphael Sonabend and Franz J Király and Andreas Bender and Bernd Bischl and Michel Lang},
journal = {Bioinformatics},
month = {02},
year = {2021},
doi = {10.1093/bioinformatics/btab039},
issn = {1367-4803},
}