mlr-org / mlr3proba

Probabilistic Learning for mlr3
https://mlr3proba.mlr-org.com
GNU Lesser General Public License v3.0
126 stars 20 forks source link

Handling competing risks in rfsrc/proba #230

Open funnell opened 2 years ago

funnell commented 2 years ago

Expected Behaviour

Benchmarking to complete for a competing risks model

Actual Behaviour

Error in dimnames(x) <- dn: length of 'dimnames' [2] not equal to array extent This error doesn't occur if I make sure the status variable is only 0 or 1. It also doesn't occur if I just run learner$train(follic_task)

Reprex

library(mlr3verse)
#> Loading required package: mlr3
library(mlr3extralearners)
#> 
#> Attaching package: 'mlr3extralearners'
#> The following objects are masked from 'package:mlr3':
#> 
#>     lrn, lrns
library(randomForestSRC)
#> 
#>  randomForestSRC 2.14.0 
#>  
#>  Type rfsrc.news() to see new features, changes, and bug fixes. 
#> 
#> 
#> Attaching package: 'randomForestSRC'
#> The following object is masked from 'package:mlr3verse':
#> 
#>     tune
data(follic, package = "randomForestSRC")
follic_task <- as_task_surv(
  follic, event = "status", time = "time", type = "right"
)
learner = lrn("surv.rfsrc")
benchmark(
  benchmark_grid(
    tasks=list(follic_task), learners=list(learner), resamplings=rsmp("cv", folds=3)
  )
)
#> INFO  [13:58:06.277] [mlr3] Running benchmark with 3 resampling iterations 
#> INFO  [13:58:06.354] [mlr3] Applying learner 'surv.rfsrc' on task 'follic' (iter 1/3)
#> Error in dimnames(x) <- dn: length of 'dimnames' [2] not equal to array extent

Created on 2021-11-22 by the reprex package (v2.0.1)

Session info ``` r sessionInfo() #> R version 4.1.1 (2021-08-10) #> Platform: x86_64-apple-darwin13.4.0 (64-bit) #> Running under: macOS Catalina 10.15.7 #> #> Matrix products: default #> BLAS/LAPACK: /Users/funnellt/miniconda3/envs/mbml/lib/libopenblasp-r0.3.18.dylib #> #> locale: #> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 #> #> attached base packages: #> [1] stats graphics grDevices utils datasets methods base #> #> other attached packages: #> [1] randomForestSRC_2.14.0 mlr3extralearners_0.5.15 mlr3verse_0.2.2 #> [4] mlr3_0.13.0 #> #> loaded via a namespace (and not attached): #> [1] fs_1.5.0 RColorBrewer_1.1-2 bbotk_0.4.0 #> [4] data.tree_1.0.0 mlr3proba_0.4.2 mlr3pipelines_0.4.0 #> [7] mlr3learners_0.5.0 tools_4.1.1 backports_1.3.0 #> [10] utf8_1.2.2 R6_2.5.1 DBI_1.1.1 #> [13] colorspace_2.0-2 mlr3data_0.5.0 withr_2.4.2 #> [16] mlr3viz_0.5.7 mlr3misc_0.9.5 tidyselect_1.1.1 #> [19] compiler_4.1.1 cli_3.1.0 ooplah_0.1.0 #> [22] lgr_0.4.3 scales_1.1.1 checkmate_2.0.0 #> [25] palmerpenguins_0.1.0 mlr3tuning_0.9.0 stringr_1.4.0 #> [28] digest_0.6.28 rmarkdown_2.11 param6_0.2.3 #> [31] paradox_0.7.1 set6_0.2.3 pkgconfig_2.0.3 #> [34] htmltools_0.5.2 parallelly_1.28.1 fastmap_1.1.0 #> [37] highr_0.9 htmlwidgets_1.5.4 rlang_0.4.12 #> [40] visNetwork_2.1.0 generics_0.1.1 jsonlite_1.7.2 #> [43] dplyr_1.0.7 magrittr_2.0.1 Matrix_1.3-4 #> [46] Rcpp_1.0.7 mlr3fselect_0.6.0 munsell_0.5.0 #> [49] fansi_0.5.0 lifecycle_1.0.1 stringi_1.7.5 #> [52] yaml_2.2.1 grid_4.1.1 parallel_4.1.1 #> [55] dictionar6_0.1.3 listenv_0.8.0 crayon_1.4.2 #> [58] lattice_0.20-45 splines_4.1.1 mlr3cluster_0.1.2 #> [61] knitr_1.35 pillar_1.6.4 mlr3filters_0.4.2 #> [64] uuid_1.0-3 future.apply_1.8.1 codetools_0.2-18 #> [67] reprex_2.0.1 glue_1.5.0 evaluate_0.14 #> [70] data.table_1.14.2 vctrs_0.3.8 distr6_1.6.2 #> [73] gtable_0.3.0 purrr_0.3.4 clue_0.3-60 #> [76] future_1.23.0 assertthat_0.2.1 ggplot2_3.3.5 #> [79] xfun_0.27 pracma_2.3.3 survival_3.2-13 #> [82] tibble_3.1.6 cluster_2.1.2 DiagrammeR_1.0.6.1 #> [85] globals_0.14.0 ellipsis_0.3.2 clusterCrit_1.2.8 ```
RaphaelS1 commented 2 years ago

This isn't a bug. mlr3proba doesn't currently support competing risks. When it does we will add this as a property for learners that handle it. For now the above behaviour is expected

RaphaelS1 commented 2 years ago

If you can demonstrate the same problem with a non-competing risks task however then it may be a bug!

funnell commented 2 years ago

@RaphaelS1 would this issue be better placed in another repo, or closed? Also, are there immediate plans to support competing risk tasks? And if so, would that effort be something an MLR3 novice can easily contribute to?

RaphaelS1 commented 2 years ago

@sebffischer can you transfer to mlr3proba?

@funnell whilst I usually love when someone volunteers to contribute anything, unfortunately this first requires an internal design decision about how the implementation looks. After that the coding implementation is relatively straightforward.

Any preliminary thoughts on this @adibender ?

adibender commented 2 years ago

The Surv object does allow factor variables for the event in order to indicate CR/Multi-state outcomes. So specifying the task should be possible. How this is passed to the individual algorithms will be very heterogeneous, however, and not all algos will have customized methods like RFSRC. For those algos that don't have specialised methods, we would have to split the task in K tasks internally (one for each competing outcome) then fit the algos to each of them and aggregate the results afterwards and during evaluation... not sure if mlr3 was designed for this, but maybe through pipelines? Or how is multi-task classifcation handled for example?

RaphaelS1 commented 2 years ago

We'll find time for a design meeting to discuss properly, not an easy answer... Pipelines seems wrong because it's too specialised

bblodfon commented 8 months ago

Discussed with Andreas, we should move this forward at some point