mlr-org / mlr3proba

Probabilistic Learning for mlr3
https://mlr3proba.mlr-org.com/
GNU Lesser General Public License v3.0
130 stars 20 forks source link
density-estimation machine-learning mlr3 probabilistic-regression probabilistic-supervised-learning r r-package supervised-learning survival-analysis

mlr3proba

Probabilistic Supervised Learning for mlr3 (website).

R-CMD-check runiverse GitHub
Discussions Article StackOverflow Mattermost

What is mlr3proba?

mlr3proba is a machine learning toolkit for making probabilistic predictions within the mlr3 ecosystem. It currently supports the following tasks:

  1. Predictive survival analysis: survival analysis where individual hazards and survival distributions can be queried.
  2. Unconditional distribution estimation: main returned output is the distribution. Sub-cases are density estimation and unconditional survival estimation.
  3. Probabilistic supervised regression: Supervised regression with a predictive distribution as the return type.

The survival analysis part is considered in a mature state, the rest are in early stages of development.

Feature Overview

Key features of mlr3proba focus on survival analysis and are:

Installation

mlr3proba is not currently on CRAN. Please follow one of the two following methods to install it:

R-universe

install.packages("mlr3proba", repos = "https://mlr-org.r-universe.dev")

Or for easier installation going forward:

  1. Run usethis::edit_r_environ() then in the file that opened add or edit options to look something like:
options(repos = c(
  raphaels1 = "https://raphaels1.r-universe.dev",
  mlrorg    = "https://mlr-org.r-universe.dev", # add this line
  CRAN      = "https://cloud.r-project.org"
))
  1. Save and close the file, restart your R session
  2. Run install.packages("mlr3proba") as usual

GitHub

Install the latest development version:

remotes::install_github("mlr-org/mlr3proba")

Learners

Measures

For density estimation and probabilistic regression only the log-loss is currently implemented. For survival analysis, see full list here.

Some commonly used measures are the following:

ID Measure Package Category Prediction Type
surv.dcalib D-Calibration mlr3proba Calibration distr
surv.cindex Concordance Index mlr3proba Discrimination crank
surv.uno_auc Uno’s AUC survAUC Discrimination lp
surv.graf Integrated Brier Score mlr3proba Scoring Rule distr
surv.rcll Right-Censored Log loss mlr3proba Scoring Rule distr
surv.intlogloss Integrated Log-Likelihood mlr3proba Scoring Rule distr

Bugs, Questions, Feedback

mlr3proba is a free and open source software project that encourages participation and feedback. If you have any issues, questions, suggestions or feedback, please do not hesitate to open an “issue” about it on the GitHub page!

In case of problems / bugs, it is often helpful if you provide a “minimum working example” using reprex that showcases the behavior.

Similar Projects

Predecessors to this package are previous instances of survival modelling in mlr. The skpro package in the python/scikit-learn ecosystem follows a similar interface for probabilistic supervised learning and is an architectural predecessor. Several packages exist which allow probabilistic predictive modelling with a Bayesian model specific general interface, such as rjags and stan. For implementation of a few survival models and measures, a central package is survival. There does not appear to be a package that provides an architectural framework for distribution/density estimation, see this list for a review of density estimation packages in R.

Acknowledgements

Several people contributed to the building of mlr3proba. Firstly, thanks to Michel Lang for writing mlr3survival. Several learners and measures implemented in mlr3proba, as well as the prediction, task, and measure surv objects, were written initially in mlr3survival before being absorbed into mlr3proba. Secondly thanks to Franz Kiraly for major contributions towards the design of the proba-specific parts of the package, including compositors and predict types. Also for mathematical contributions towards the scoring rules implemented in the package. Finally thanks to Bernd Bischl and the rest of the mlr core team for building mlr3 and for many conversations about the design of mlr3proba.

Citing mlr3proba

If you use mlr3proba, please cite our Bioinformatics article:

@Article{,
  title = {mlr3proba: An R Package for Machine Learning in Survival Analysis},
  author = {Raphael Sonabend and Franz J Király and Andreas Bender and Bernd Bischl and Michel Lang},
  journal = {Bioinformatics},
  month = {02},
  year = {2021},
  doi = {10.1093/bioinformatics/btab039},
  issn = {1367-4803},
}