ML study - Githubissues

fa1999abdi commented 6 months ago

"I'm planning to do a machine learning study. I want to optimize the hyperparameters, train and benchmark the model using the train set, and then measure the accuracy of the model using the test set, and finally analyze and I want to do time-dependent ROC analysis and get the AUC Roc. I have read the MLR 3 book, but I'm having trouble understanding how to code this particular section. Can you help me complete this part of my study?"

bblodfon commented 6 months ago

@fa1999abdi Can you please copy your code, run reprex::reprex(), and then copy-paste the result in the above post? Add the library versions as well with devtools::session_info(). Then add specific questions/comments on the code, ie what does work, what doesn't? Where you don't understand what is happening exactly? what you need to do that is not there? All that would help me help you.

fa1999abdi commented 6 months ago

Hi John I would like to share my code with you while maintaining its privacy. Can I send you the Reprex file in a secure environment?

On Sun, May 12, 2024 at 1:52 PM John Zobolas @.***> wrote:

Can you please copy your code, run reprex::reprex(), and then copy-paste the result in the above post? Add the library versions as well with devtools::session_info(). Then add specific questions/comments on the code, ie what does work, what doesn't? Where you don't understand what is happening exactly? what you need to do that is not there? All that would help me help you.

— Reply to this email directly, view it on GitHub https://github.com/mlr-org/mlr3proba/issues/381#issuecomment-2106197646, or unsubscribe https://github.com/notifications/unsubscribe-auth/BF3TSUX6NIRW2TZVEOIQHMDZB47G7AVCNFSM6AAAAABHSV6MDKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBWGE4TONRUGY . You are receiving this because you authored the thread.Message ID: @.***>

bblodfon commented 6 months ago

Hi @fa1999abdi, this is an open source research project and so I won't spend any time looking at private code. These issues are also for meant for the community so that others that might have similar problems benefit from this.

fa1999abdi commented 6 months ago

I want to make a prediction of the survival time on the test data using the gradient boosting learner and draw the survival graph for the test data. The questions next to the codes have been commented on.

library(mlr3verse)
#> Loading required package: mlr3
library(mlr3proba)
library(xgboost)
library(tidyverse)
library(survival)
library(mlr3)
library(mlr3proba)
library(mlr3learners)
library(mlr3filters)
library(reprex)

set.seed(42)
train_indxs = sample(seq_len(nrow(veteran)), 100)

task = as_task_surv(x = veteran, time = 'time', event = 'status')
poe = po('encode')
task = poe$train(list(task))[[1]]

set.seed(42)
part = partition(task, ratio = 0.8)

#I have tuned hyperparameters the learner with auto-tuning on train data. 

learner_xgboost = lrn("surv.xgboost", eta = to_tune(1e-4, 1),gamma = to_tune(1e-4, 1),
                      max_depth = to_tune(1,35), min_child_weight = to_tune(0, 10))
learner_xgboost$param_set$search_space()
#> <ParamSet>
#>                  id    class lower upper nlevels        default value
#> 1:              eta ParamDbl 1e-04     1     Inf <NoDefault[3]>      
#> 2:            gamma ParamDbl 1e-04     1     Inf <NoDefault[3]>      
#> 3:        max_depth ParamInt 1e+00    35      35 <NoDefault[3]>      
#> 4: min_child_weight ParamDbl 0e+00    10     Inf <NoDefault[3]>
set.seed(456)
at_xgboost = auto_tuner(
  tuner = tnr("random_search", batch_size = 50),
  learner = learner_xgboost,
  resampling = rsmp("cv", folds = 3),
  measure = msr("surv.cindex"),
  terminator = trm("evals", n_evals = 50))

modelxgboost <- at_xgboost$train(task, part$train)
#> INFO  [16:07:00.462] [bbotk] Starting to optimize 4 parameter(s) with '<OptimizerRandomSearch>' and '<TerminatorEvals> [n_evals=50, k=0]'
#> INFO  [16:07:00.513] [bbotk] Evaluating 50 configuration(s)
#> INFO  [16:07:00.537] [mlr3] Running benchmark with 150 resampling iterations
#> INFO  [16:07:00.603] [mlr3] Applying learner 'surv.xgboost' on task 'veteran' (iter 1/3)
#> INFO  [16:07:00.671] [mlr3] Applying learner 'surv.xgboost' on task 'veteran' (iter 2/3)
#> ...
#> INFO  [16:07:08.213] [mlr3] Applying learner 'surv.xgboost' on task 'veteran' (iter 2/3)
#> INFO  [16:07:08.261] [mlr3] Applying learner 'surv.xgboost' on task 'veteran' (iter 3/3)
#> INFO  [16:07:08.358] [mlr3] Finished benchmark
#> INFO  [16:07:09.130] [bbotk] Result of batch 1:
#> INFO  [16:07:09.134] [bbotk]         eta       gamma max_depth min_child_weight surv.cindex warnings errors
#> INFO  [16:07:09.134] [bbotk]  0.69240361 0.804262526         3        8.8348667   0.6641731        0      0
#> INFO  [16:07:09.134] [bbotk]  0.06123976 0.987442239        26        1.5563251   0.6863813        0      0
#> INFO  [16:07:09.134] [bbotk]  0.86951133 0.623217709        32        6.2168556   0.6887658        0      0
#> INFO  [16:07:09.134] [bbotk]  0.87223711 0.830520471        10        3.9803411   0.7069063        0      0
#> INFO  [16:07:09.134] [bbotk]  0.43205868 0.889244607         7        0.6064467   0.6775195        0      0
#> ...
#> INFO  [16:07:09.134] [bbotk]  0.34691562 0.538365898         6        0.1004618   0.6523724        0      0
#> INFO  [16:07:09.134] [bbotk]  0.26071728 0.936020394        28        4.2226101   0.6894068        0      0
#> INFO  [16:07:09.134] [bbotk]  0.63756977 0.646424119        25        8.1569937   0.6821928        0      0
#> INFO  [16:07:09.134] [bbotk]         eta       gamma max_depth min_child_weight surv.cindex warnings errors
#> INFO  [16:07:09.134] [bbotk]  runtime_learners                                uhash
#> INFO  [16:07:09.134] [bbotk]              0.12 ee4981c6-bd07-4919-8147-4ca021e621c0
#> INFO  [16:07:09.134] [bbotk]              0.09 aaa67e14-d32b-492a-96bc-5246f7a8d993
#> INFO  [16:07:09.134] [bbotk]              0.09 9aa90106-0b83-4894-95b4-ed67b40c52dc
#> INFO  [16:07:09.134] [bbotk]              0.09 0e4fe10e-c165-498c-9d32-b597942cbf43
#> INFO  [16:07:09.134] [bbotk]              0.10 10d1964b-b56e-4cac-b3b9-7baf63e913a3
#> INFO  [16:07:09.134] [bbotk]              0.07 9df279e3-bae1-4fef-837c-c03491abc69b
#> INFO  [16:07:09.134] [bbotk]              0.35 eb095987-6221-4b6a-a3d8-45eb78504aa9
#> INFO  [16:07:09.134] [bbotk]              0.09 7b097928-ef11-4cdd-ac24-f53931ebda25
#> INFO  [16:07:09.134] [bbotk]              0.09 58b02576-bfe1-4b6e-9945-4974b113bfc4
#> INFO  [16:07:09.134] [bbotk]              0.09 3e75a77c-5f84-4c4f-bab4-c87446cea6a7
#> INFO  [16:07:09.134] [bbotk]              0.16 907b12c8-d61e-45dd-9092-f904786106d1
#> INFO  [16:07:09.134] [bbotk]              0.10 94231a85-7ad5-4713-bd24-0e642bd0847f
#> INFO  [16:07:09.134] [bbotk]              0.09 eb417ce3-e597-43b1-a820-7909a63ca1e3
#> INFO  [16:07:09.134] [bbotk]              0.09 9d4de986-135f-4a45-b311-ad1b1e251e89
#> ...
#> INFO  [16:07:09.134] [bbotk]              0.10 d14b06be-ad6c-4201-a377-1ebe01af56a6
#> INFO  [16:07:09.134] [bbotk]              0.09 33564a8b-a664-461c-a614-8490906519de
#> INFO  [16:07:09.134] [bbotk]              0.08 6ecac5d0-155a-4bc9-ba1e-010f5dd2575c
#> INFO  [16:07:09.134] [bbotk]  runtime_learners                                uhash
#> INFO  [16:07:09.144] [bbotk] Finished optimizing after 50 evaluation(s)
#> INFO  [16:07:09.145] [bbotk] Result:
#> INFO  [16:07:09.146] [bbotk]        eta     gamma max_depth min_child_weight learner_param_vals  x_domain
#> INFO  [16:07:09.146] [bbotk]  0.8722371 0.8305205        10         3.980341          <list[8]> <list[4]>
#> INFO  [16:07:09.146] [bbotk]  surv.cindex
#> INFO  [16:07:09.146] [bbotk]    0.7069063

#Recursive Feature Elimination with Cross Validation
instance = fsi(
  task =(task),#How to use train data(part$train)?
       #How can I use tuned Lerner in the previous step to Select the variable?
  learner = lrn("surv.xgboost"),
  resampling = rsmp("cv", folds = 6),
  measures = msr("surv.cindex"),
  terminator = trm("none"),
  )

optimizer$optimize(instance) #Why does it give an error?
#> Error in eval(expr, envir, enclos): object 'optimizer' not found

#I also want to subset the task to the optimal feature set 
#and again train the learner.

task$select(instance$result_feature_set)
#> Error in .__Task__select(self = self, private = private, super = super, : Assertion on 'cols' failed: Must be of type 'character', not 'NULL'.
learner$train(task,part$train)
#> Error in eval(expr, envir, enclos): object 'learner' not found

#And finally, I use the trained model to predict the survival time on test data,
#which will encounter an error.

p =learner$predict(task, part$test)
#> Error in eval(expr, envir, enclos): object 'learner' not found
devtools :: session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.3.2 (2023-10-31 ucrt)
#>  os       Windows 10 x64 (build 19045)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  English_United States.utf8
#>  ctype    English_United States.utf8
#>  tz       Asia/Tehran
#>  date     2024-05-14
#>  pandoc   3.1.1 @ D:/R/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package           * version     date (UTC) lib source
#>  backports           1.4.1       2021-12-13 [1] CRAN (R 4.2.0)
#>  bbotk               0.7.3       2023-11-13 [1] CRAN (R 4.2.3)
#>  cachem              1.0.8       2023-05-01 [1] CRAN (R 4.2.3)
#>  callr               3.7.3       2022-11-02 [1] CRAN (R 4.2.3)
#>  checkmate           2.3.0       2023-10-25 [1] CRAN (R 4.2.3)
#>  class               7.3-22      2023-05-03 [2] CRAN (R 4.3.2)
#>  cli                 3.6.1       2023-03-23 [1] CRAN (R 4.2.3)
#>  clue                0.3-65      2023-09-23 [1] CRAN (R 4.2.3)
#>  cluster             2.1.4       2022-08-22 [2] CRAN (R 4.3.2)
#>  codetools           0.2-19      2023-02-01 [2] CRAN (R 4.3.2)
#>  colorspace          2.1-0       2023-01-23 [1] CRAN (R 4.2.3)
#>  crayon              1.5.2       2022-09-29 [1] CRAN (R 4.2.3)
#>  data.table          1.14.8      2023-02-17 [1] CRAN (R 4.2.3)
#>  DEoptimR            1.1-3       2023-10-07 [1] CRAN (R 4.2.3)
#>  devtools            2.4.5       2022-10-11 [1] CRAN (R 4.2.3)
#>  dictionar6          0.1.3       2021-09-13 [1] CRAN (R 4.2.3)
#>  digest              0.6.33      2023-07-07 [1] CRAN (R 4.2.3)
#>  diptest             0.77-0      2023-11-27 [1] CRAN (R 4.2.3)
#>  distr6              1.8.4       2023-12-09 [1] Github (xoopR/distr6@1854b22)
#>  dplyr             * 1.1.4       2023-11-17 [1] CRAN (R 4.2.3)
#>  ellipsis            0.3.2       2021-04-29 [1] CRAN (R 4.2.3)
#>  evaluate            0.22        2023-09-29 [1] CRAN (R 4.2.3)
#>  fansi               1.0.5       2023-10-08 [1] CRAN (R 4.2.3)
#>  fastmap             1.1.1       2023-02-24 [1] CRAN (R 4.2.3)
#>  flexmix             2.3-19      2023-03-16 [1] CRAN (R 4.2.3)
#>  forcats           * 1.0.0       2023-01-29 [1] CRAN (R 4.2.3)
#>  fpc                 2.2-10      2023-01-07 [1] CRAN (R 4.2.3)
#>  fs                  1.6.3       2023-07-20 [1] CRAN (R 4.2.3)
#>  future              1.33.0      2023-07-01 [1] CRAN (R 4.2.3)
#>  future.apply        1.11.0      2023-05-21 [1] CRAN (R 4.2.3)
#>  generics            0.1.3       2022-07-05 [1] CRAN (R 4.2.3)
#>  ggplot2           * 3.4.4       2023-10-12 [1] CRAN (R 4.2.3)
#>  globals             0.16.2      2022-11-21 [1] CRAN (R 4.2.2)
#>  glue                1.6.2       2022-02-24 [1] CRAN (R 4.2.3)
#>  gtable              0.3.4       2023-08-21 [1] CRAN (R 4.2.3)
#>  hms                 1.1.3       2023-03-21 [1] CRAN (R 4.2.3)
#>  htmltools           0.5.6.1     2023-10-06 [1] CRAN (R 4.2.3)
#>  htmlwidgets         1.6.2       2023-03-17 [1] CRAN (R 4.2.3)
#>  httpuv              1.6.11      2023-05-11 [1] CRAN (R 4.2.3)
#>  jsonlite            1.8.7       2023-06-29 [1] CRAN (R 4.2.3)
#>  kernlab             0.9-32      2023-01-31 [1] CRAN (R 4.2.2)
#>  knitr               1.45        2023-10-30 [1] CRAN (R 4.2.3)
#>  later               1.3.1       2023-05-02 [1] CRAN (R 4.2.3)
#>  lattice             0.21-9      2023-10-01 [2] CRAN (R 4.3.2)
#>  lgr                 0.4.4       2022-09-05 [1] CRAN (R 4.2.3)
#>  lifecycle           1.0.3       2022-10-07 [1] CRAN (R 4.2.3)
#>  listenv             0.9.0       2022-12-16 [1] CRAN (R 4.2.3)
#>  lubridate         * 1.9.3       2023-09-27 [1] CRAN (R 4.2.3)
#>  magrittr            2.0.3       2022-03-30 [1] CRAN (R 4.2.3)
#>  MASS                7.3-60      2023-05-04 [1] CRAN (R 4.2.3)
#>  Matrix              1.6-4       2023-11-30 [1] CRAN (R 4.2.3)
#>  mclust              6.0.1       2023-11-15 [1] CRAN (R 4.2.3)
#>  memoise             2.0.1       2021-11-26 [1] CRAN (R 4.2.3)
#>  mime                0.12        2021-09-28 [1] CRAN (R 4.2.0)
#>  miniUI              0.1.1.1     2018-05-18 [1] CRAN (R 4.2.3)
#>  mlr3              * 0.17.0.9000 2023-12-09 [1] Github (mlr-org/mlr3@dc2a983)
#>  mlr3cluster         0.1.8       2023-03-12 [1] CRAN (R 4.2.3)
#>  mlr3data            0.7.0       2023-06-29 [1] CRAN (R 4.2.3)
#>  mlr3extralearners   0.7.1-9000  2023-12-09 [1] Github (mlr-org/mlr3extralearners@7546845)
#>  mlr3filters       * 0.7.1       2023-02-15 [1] CRAN (R 4.2.3)
#>  mlr3fselect         0.11.0      2023-03-02 [1] CRAN (R 4.2.3)
#>  mlr3hyperband       0.4.5       2023-03-02 [1] CRAN (R 4.2.3)
#>  mlr3learners      * 0.5.7       2023-11-21 [1] CRAN (R 4.2.3)
#>  mlr3mbo             0.2.1       2023-06-05 [1] CRAN (R 4.2.3)
#>  mlr3misc            0.13.0      2023-09-20 [1] CRAN (R 4.2.3)
#>  mlr3pipelines       0.5.0-2     2023-12-08 [1] CRAN (R 4.2.3)
#>  mlr3proba         * 0.5.4       2023-12-09 [1] Github (mlr-org/mlr3proba@083e685)
#>  mlr3tuning          0.19.2      2023-11-28 [1] CRAN (R 4.2.3)
#>  mlr3tuningspaces    0.4.0       2023-04-20 [1] CRAN (R 4.3.2)
#>  mlr3verse         * 0.2.8       2023-12-13 [1] Github (mlr-org/mlr3verse@2220087)
#>  mlr3viz             0.6.2       2023-11-23 [1] CRAN (R 4.2.3)
#>  modeltools          0.2-23      2020-03-05 [1] CRAN (R 4.2.0)
#>  munsell             0.5.0       2018-06-12 [1] CRAN (R 4.2.3)
#>  nnet                7.3-19      2023-05-03 [2] CRAN (R 4.3.2)
#>  ooplah              0.2.0       2022-01-21 [1] CRAN (R 4.2.3)
#>  palmerpenguins      0.1.1       2022-08-15 [1] CRAN (R 4.2.3)
#>  paradox             0.11.1      2023-03-17 [1] CRAN (R 4.2.3)
#>  parallelly          1.36.0      2023-05-26 [1] CRAN (R 4.2.3)
#>  param6              0.2.4       2023-12-09 [1] Github (xoopR/param6@0fa3577)
#>  pillar              1.9.0       2023-03-22 [1] CRAN (R 4.2.3)
#>  pkgbuild            1.4.2       2023-06-26 [1] CRAN (R 4.2.3)
#>  pkgconfig           2.0.3       2019-09-22 [1] CRAN (R 4.2.3)
#>  pkgload             1.3.3       2023-09-22 [1] CRAN (R 4.2.3)
#>  prabclus            2.3-3       2023-10-24 [1] CRAN (R 4.2.3)
#>  prettyunits         1.2.0       2023-09-24 [1] CRAN (R 4.2.3)
#>  processx            3.8.2       2023-06-30 [1] CRAN (R 4.2.3)
#>  profvis             0.3.8       2023-05-02 [1] CRAN (R 4.2.3)
#>  promises            1.2.1       2023-08-10 [1] CRAN (R 4.2.3)
#>  ps                  1.7.5       2023-04-18 [1] CRAN (R 4.2.3)
#>  purrr             * 1.0.2       2023-08-10 [1] CRAN (R 4.2.3)
#>  R.cache             0.16.0      2022-07-21 [1] CRAN (R 4.2.3)
#>  R.methodsS3         1.8.2       2022-06-13 [1] CRAN (R 4.2.2)
#>  R.oo                1.25.0      2022-06-12 [1] CRAN (R 4.2.2)
#>  R.utils             2.12.2      2022-11-11 [1] CRAN (R 4.2.3)
#>  R6                  2.5.1       2021-08-19 [1] CRAN (R 4.2.3)
#>  Rcpp                1.0.11      2023-07-06 [1] CRAN (R 4.2.3)
#>  readr             * 2.1.4       2023-02-10 [1] CRAN (R 4.2.3)
#>  remotes             2.4.2.1     2023-07-18 [1] CRAN (R 4.2.3)
#>  reprex            * 2.1.0       2024-01-11 [1] CRAN (R 4.3.2)
#>  rlang               1.1.1       2023-04-28 [1] CRAN (R 4.2.3)
#>  rmarkdown           2.25        2023-09-18 [1] CRAN (R 4.2.3)
#>  robustbase          0.99-1      2023-11-29 [1] CRAN (R 4.2.3)
#>  rstudioapi          0.15.0      2023-07-07 [1] CRAN (R 4.2.3)
#>  scales              1.2.1       2022-08-20 [1] CRAN (R 4.2.3)
#>  sessioninfo         1.2.2       2021-12-06 [1] CRAN (R 4.2.3)
#>  set6                0.2.6       2023-12-09 [1] Github (xoopR/set6@a901255)
#>  shiny               1.7.5       2023-08-12 [1] CRAN (R 4.2.3)
#>  spacefillr          0.3.2       2022-10-25 [1] CRAN (R 4.2.3)
#>  stringi             1.7.12      2023-01-11 [1] CRAN (R 4.2.2)
#>  stringr           * 1.5.0       2022-12-02 [1] CRAN (R 4.2.3)
#>  styler              1.10.2      2023-08-29 [1] CRAN (R 4.2.3)
#>  survival          * 3.5-7       2023-08-14 [1] CRAN (R 4.2.3)
#>  tibble            * 3.2.1       2023-03-20 [1] CRAN (R 4.2.3)
#>  tidyr             * 1.3.0       2023-01-24 [1] CRAN (R 4.2.3)
#>  tidyselect          1.2.0       2022-10-10 [1] CRAN (R 4.2.3)
#>  tidyverse         * 2.0.0       2023-02-22 [1] CRAN (R 4.2.3)
#>  timechange          0.2.0       2023-01-11 [1] CRAN (R 4.2.3)
#>  tzdb                0.4.0       2023-05-12 [1] CRAN (R 4.2.3)
#>  urlchecker          1.0.1       2021-11-30 [1] CRAN (R 4.2.3)
#>  usethis             2.2.2       2023-07-06 [1] CRAN (R 4.2.3)
#>  utf8                1.2.3       2023-01-31 [1] CRAN (R 4.2.3)
#>  uuid                1.1-1       2023-08-17 [1] CRAN (R 4.2.3)
#>  vctrs               0.6.4       2023-10-12 [1] CRAN (R 4.2.3)
#>  withr               2.5.1       2023-09-26 [1] CRAN (R 4.2.3)
#>  xfun                0.40        2023-08-09 [1] CRAN (R 4.2.3)
#>  xgboost           * 1.7.6.1     2023-12-06 [1] CRAN (R 4.2.3)
#>  xtable              1.8-4       2019-04-21 [1] CRAN (R 4.2.3)
#>  yaml                2.3.7       2023-01-23 [1] CRAN (R 4.2.3)
#> 
#>  [1] D:/R/Libraries
#>  [2] C:/Program Files/R/R-4.3.2/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

^{Created on 2024-05-14 with reprex v2.1.0} `

bblodfon commented 6 months ago

Hi @fa1999abdi. Please read the documentation and spend more time understanding what each function you use does via the given examples. The problems you had, had nothing to do with mlr3proba, so it shouldn't be an issue here. Please ask questions like these in stackoverflow.

On your issue: see mlr3fselect::fsi(), you need to select one of mlr_fselectors. The all-in-one solution you can use to do what you want is the mlr3fselect::auto_fselector which has an example. Here is a working example for your code:

library(mlr3verse)
#> Loading required package: mlr3
library(mlr3proba)

task = as_task_surv(x = survival::veteran, time = 'time', event = 'status')
poe = po('encode')
task = poe$train(list(task))[[1]]

set.seed(42)
part = partition(task, ratio = 0.8)

# better "surv.xgboost.cox" to get `distr` predictions as well
xgb = lrn("surv.xgboost.cox", eta = to_tune(1e-4, 1),gamma = to_tune(1e-4, 1),
  max_depth = to_tune(1,35), min_child_weight = to_tune(0, 10))

set.seed(456)
at_xgb = auto_tuner(
  tuner = tnr("random_search"),
  learner = xgb,
  resampling = rsmp("cv", folds = 3),
  measure = msr("surv.cindex"),
  terminator = trm("evals", n_evals = 10)
)

modelxgboost = at_xgb$train(task, part$train)
#> INFO  [00:50:46.374] [bbotk] Starting to optimize 4 parameter(s) with '<OptimizerRandomSearch>' and '<TerminatorEvals> [n_evals=10, k=0]'
#> INFO  [00:50:46.460] [bbotk] Evaluating 1 configuration(s)
#> INFO  [00:50:46.476] [mlr3] Running benchmark with 3 resampling iterations
#> INFO  [00:50:46.516] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 1/3)
#> INFO  [00:50:46.578] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 2/3)
#> INFO  [00:50:46.628] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 3/3)
#> INFO  [00:50:46.874] [mlr3] Finished benchmark
#> INFO  [00:50:46.928] [bbotk] Result of batch 1:
#> INFO  [00:50:46.933] [bbotk]        eta      gamma max_depth min_child_weight surv.cindex warnings errors
#> INFO  [00:50:46.933] [bbotk]  0.6924036 0.06123976        31         8.722243   0.6632974        0      0
#> INFO  [00:50:46.933] [bbotk]  runtime_learners                                uhash
#> INFO  [00:50:46.933] [bbotk]             0.295 1393e023-c09e-4837-adfd-9c62c489c292
#> INFO  [00:50:46.950] [bbotk] Evaluating 1 configuration(s)
#> INFO  [00:50:46.959] [mlr3] Running benchmark with 3 resampling iterations
#> INFO  [00:50:46.965] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 1/3)
#> INFO  [00:50:47.016] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 2/3)
#> INFO  [00:50:47.311] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 3/3)
#> INFO  [00:50:47.361] [mlr3] Finished benchmark
#> INFO  [00:50:47.400] [bbotk] Result of batch 2:
#> INFO  [00:50:47.408] [bbotk]        eta     gamma max_depth min_child_weight surv.cindex warnings errors
#> INFO  [00:50:47.408] [bbotk]  0.5470714 0.6802136        29        0.6464914   0.6759085        0      0
#> INFO  [00:50:47.408] [bbotk]  runtime_learners                                uhash
#> INFO  [00:50:47.408] [bbotk]             0.333 92792d90-4c8b-445a-8403-7f1c5cc09187
#> INFO  [00:50:47.426] [bbotk] Evaluating 1 configuration(s)
#> INFO  [00:50:47.434] [mlr3] Running benchmark with 3 resampling iterations
#> INFO  [00:50:47.441] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 1/3)
#> INFO  [00:50:47.489] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 2/3)
#> INFO  [00:50:47.539] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 3/3)
#> INFO  [00:50:47.586] [mlr3] Finished benchmark
#> INFO  [00:50:47.631] [bbotk] Result of batch 3:
#> INFO  [00:50:47.634] [bbotk]        eta     gamma max_depth min_child_weight surv.cindex warnings errors
#> INFO  [00:50:47.634] [bbotk]  0.5275122 0.6004684         2         2.587627   0.6602793        0      0
#> INFO  [00:50:47.634] [bbotk]  runtime_learners                                uhash
#> INFO  [00:50:47.634] [bbotk]             0.089 a597fe3f-0a93-4a36-9287-0620970e887a
#> INFO  [00:50:47.651] [bbotk] Evaluating 1 configuration(s)
#> INFO  [00:50:47.659] [mlr3] Running benchmark with 3 resampling iterations
#> INFO  [00:50:47.665] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 1/3)
#> INFO  [00:50:47.714] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 2/3)
#> INFO  [00:50:47.779] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 3/3)
#> INFO  [00:50:47.831] [mlr3] Finished benchmark
#> INFO  [00:50:47.880] [bbotk] Result of batch 4:
#> INFO  [00:50:47.883] [bbotk]         eta     gamma max_depth min_child_weight surv.cindex warnings errors
#> INFO  [00:50:47.883] [bbotk]  0.08838163 0.7743576        11         2.235016   0.6677882        0      0
#> INFO  [00:50:47.883] [bbotk]  runtime_learners                                uhash
#> INFO  [00:50:47.883] [bbotk]             0.107 0c1a3963-702e-408f-b76a-b519ded4dde7
#> INFO  [00:50:47.900] [bbotk] Evaluating 1 configuration(s)
#> INFO  [00:50:47.909] [mlr3] Running benchmark with 3 resampling iterations
#> INFO  [00:50:47.915] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 1/3)
#> INFO  [00:50:47.972] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 2/3)
#> INFO  [00:50:48.038] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 3/3)
#> INFO  [00:50:48.087] [mlr3] Finished benchmark
#> INFO  [00:50:48.132] [bbotk] Result of batch 5:
#> INFO  [00:50:48.135] [bbotk]        eta      gamma max_depth min_child_weight surv.cindex warnings errors
#> INFO  [00:50:48.135] [bbotk]  0.6383713 0.05744726         8         6.555466    0.688853        0      0
#> INFO  [00:50:48.135] [bbotk]  runtime_learners                                uhash
#> INFO  [00:50:48.135] [bbotk]             0.112 82a6c452-0ddc-41ae-9aa5-af7bc1d3d352
#> INFO  [00:50:48.151] [bbotk] Evaluating 1 configuration(s)
#> INFO  [00:50:48.160] [mlr3] Running benchmark with 3 resampling iterations
#> INFO  [00:50:48.166] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 1/3)
#> INFO  [00:50:48.220] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 2/3)
#> INFO  [00:50:48.300] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 3/3)
#> INFO  [00:50:48.357] [mlr3] Finished benchmark
#> INFO  [00:50:48.395] [bbotk] Result of batch 6:
#> INFO  [00:50:48.398] [bbotk]       eta    gamma max_depth min_child_weight surv.cindex warnings errors
#> INFO  [00:50:48.398] [bbotk]  0.125642 0.181675        19         1.169374   0.6812058        0      0
#> INFO  [00:50:48.398] [bbotk]  runtime_learners                                uhash
#> INFO  [00:50:48.398] [bbotk]             0.131 7ca02bf7-8c0b-4f16-a3cb-288e1da6a5c8
#> INFO  [00:50:48.414] [bbotk] Evaluating 1 configuration(s)
#> INFO  [00:50:48.422] [mlr3] Running benchmark with 3 resampling iterations
#> INFO  [00:50:48.428] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 1/3)
#> INFO  [00:50:48.480] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 2/3)
#> INFO  [00:50:48.531] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 3/3)
#> INFO  [00:50:48.585] [mlr3] Finished benchmark
#> INFO  [00:50:48.622] [bbotk] Result of batch 7:
#> INFO  [00:50:48.625] [bbotk]       eta     gamma max_depth min_child_weight surv.cindex warnings errors
#> INFO  [00:50:48.625] [bbotk]  0.837198 0.1869602        29         9.887145   0.6682595        0      0
#> INFO  [00:50:48.625] [bbotk]  runtime_learners                                uhash
#> INFO  [00:50:48.625] [bbotk]             0.097 b8e380be-0baa-433b-8d5b-6ca6956c700e
#> INFO  [00:50:48.641] [bbotk] Evaluating 1 configuration(s)
#> INFO  [00:50:48.649] [mlr3] Running benchmark with 3 resampling iterations
#> INFO  [00:50:48.656] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 1/3)
#> INFO  [00:50:48.707] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 2/3)
#> INFO  [00:50:48.763] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 3/3)
#> INFO  [00:50:48.812] [mlr3] Finished benchmark
#> INFO  [00:50:48.850] [bbotk] Result of batch 8:
#> INFO  [00:50:48.853] [bbotk]         eta     gamma max_depth min_child_weight surv.cindex warnings errors
#> INFO  [00:50:48.853] [bbotk]  0.07591018 0.8577488         5         5.058832   0.6663479        0      0
#> INFO  [00:50:48.853] [bbotk]  runtime_learners                                uhash
#> INFO  [00:50:48.853] [bbotk]             0.091 709306e0-7d90-4051-b94e-6b6ede312fb3
#> INFO  [00:50:48.870] [bbotk] Evaluating 1 configuration(s)
#> INFO  [00:50:48.879] [mlr3] Running benchmark with 3 resampling iterations
#> INFO  [00:50:48.885] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 1/3)
#> INFO  [00:50:48.935] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 2/3)
#> INFO  [00:50:49.043] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 3/3)
#> INFO  [00:50:49.106] [mlr3] Finished benchmark
#> INFO  [00:50:49.146] [bbotk] Result of batch 9:
#> INFO  [00:50:49.149] [bbotk]        eta     gamma max_depth min_child_weight surv.cindex warnings errors
#> INFO  [00:50:49.149] [bbotk]  0.2607173 0.6375698        29          9.87441   0.6691351        0      0
#> INFO  [00:50:49.149] [bbotk]  runtime_learners                                uhash
#> INFO  [00:50:49.149] [bbotk]             0.163 a4c69de7-a3c2-44bb-8dab-cd6a882b163f
#> INFO  [00:50:49.166] [bbotk] Evaluating 1 configuration(s)
#> INFO  [00:50:49.175] [mlr3] Running benchmark with 3 resampling iterations
#> INFO  [00:50:49.181] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 1/3)
#> INFO  [00:50:49.240] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 2/3)
#> INFO  [00:50:49.503] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 3/3)
#> INFO  [00:50:49.551] [mlr3] Finished benchmark
#> INFO  [00:50:49.591] [bbotk] Result of batch 10:
#> INFO  [00:50:49.593] [bbotk]        eta    gamma max_depth min_child_weight surv.cindex warnings errors
#> INFO  [00:50:49.593] [bbotk]  0.8892446 0.334645         5         3.901119   0.6920791        0      0
#> INFO  [00:50:49.593] [bbotk]  runtime_learners                                uhash
#> INFO  [00:50:49.593] [bbotk]             0.312 c2ab1381-e30e-4ea6-837d-880c905f8ac2
#> INFO  [00:50:49.623] [bbotk] Finished optimizing after 10 evaluation(s)
#> INFO  [00:50:49.624] [bbotk] Result:
#> INFO  [00:50:49.626] [bbotk]        eta    gamma max_depth min_child_weight learner_param_vals  x_domain
#> INFO  [00:50:49.626] [bbotk]      <num>    <num>     <int>            <num>             <list>    <list>
#> INFO  [00:50:49.626] [bbotk]  0.8892446 0.334645         5         3.901119          <list[8]> <list[4]>
#> INFO  [00:50:49.626] [bbotk]  surv.cindex
#> INFO  [00:50:49.626] [bbotk]        <num>
#> INFO  [00:50:49.626] [bbotk]    0.6920791
learner = modelxgboost$learner$clone() # get the tuned learner
learner
#> <LearnerSurvXgboostCox:surv.xgboost.cox>: Extreme Gradient Boosting Cox
#> * Model: list
#> * Parameters: early_stopping_set=none, eta=0.8892, gamma=0.3346,
#>   max_depth=5, min_child_weight=3.901, nrounds=1, nthread=1, verbose=0
#> * Packages: mlr3, mlr3proba, mlr3extralearners, xgboost
#> * Predict Types:  [crank], distr, lp
#> * Feature Types: integer, numeric
#> * Properties: importance, missings, weights

# Recursive Feature Elimination with Cross Validation
instance = fsi(
  task = task$clone()$filter(part$train), # How to use train data(part$train)?
  learner = learner, # How can I use tuned Lerner in the previous step to Select the variable?
  resampling = rsmp("cv", folds = 6),
  measures = msr("surv.cindex"),
  terminator = trm("none")
)

# optimizer$optimize(instance) # Why does it give an error?
#> Error in eval(expr, envir, enclos): object 'optimizer' not found

# See `mlr3fselect::fsi`, you need to select one of `mlr_fselectors`
# And since you want RFE it has to be that:
fselector = fs("rfe", n_features = 2, feature_fraction = 0.8)
fselector$optimize(instance)
#> INFO  [00:50:50.161] [bbotk] Starting to optimize 9 parameter(s) with '<FSelectorRFE>' and '<TerminatorNone>'
#> INFO  [00:50:50.163] [bbotk] Evaluating 1 configuration(s)
#> INFO  [00:50:50.172] [mlr3] Running benchmark with 6 resampling iterations
#> INFO  [00:50:50.179] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 1/6)
#> INFO  [00:50:50.220] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 2/6)
#> INFO  [00:50:50.261] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 3/6)
#> INFO  [00:50:50.539] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 4/6)
#> INFO  [00:50:50.583] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 5/6)
#> INFO  [00:50:50.626] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 6/6)
#> INFO  [00:50:50.667] [mlr3] Finished benchmark
#> INFO  [00:50:50.699] [bbotk] Result of batch 1:
#> INFO  [00:50:50.703] [bbotk]   age diagtime karno prior  trt celltype.squamous celltype.smallcell
#> INFO  [00:50:50.703] [bbotk]  TRUE     TRUE  TRUE  TRUE TRUE              TRUE               TRUE
#> INFO  [00:50:50.703] [bbotk]  celltype.adeno celltype.large surv.cindex warnings errors runtime_learners
#> INFO  [00:50:50.703] [bbotk]            TRUE           TRUE   0.7046958        0      0            0.419
#> INFO  [00:50:50.703] [bbotk]                                 uhash
#> INFO  [00:50:50.703] [bbotk]  a4ff7734-9d05-4cd9-ae0a-8be79807e216
#> INFO  [00:50:50.784] [bbotk] Evaluating 1 configuration(s)
#> INFO  [00:50:50.794] [mlr3] Running benchmark with 6 resampling iterations
#> INFO  [00:50:50.801] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 1/6)
#> INFO  [00:50:50.844] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 2/6)
#> INFO  [00:50:50.886] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 3/6)
#> INFO  [00:50:50.931] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 4/6)
#> INFO  [00:50:50.974] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 5/6)
#> INFO  [00:50:51.027] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 6/6)
#> INFO  [00:50:51.068] [mlr3] Finished benchmark
#> INFO  [00:50:51.103] [bbotk] Result of batch 2:
#> INFO  [00:50:51.107] [bbotk]   age diagtime karno prior  trt celltype.squamous celltype.smallcell
#> INFO  [00:50:51.107] [bbotk]  TRUE     TRUE  TRUE FALSE TRUE              TRUE               TRUE
#> INFO  [00:50:51.107] [bbotk]  celltype.adeno celltype.large surv.cindex warnings errors runtime_learners
#> INFO  [00:50:51.107] [bbotk]           FALSE           TRUE   0.6971201        0      0            0.194
#> INFO  [00:50:51.107] [bbotk]                                 uhash
#> INFO  [00:50:51.107] [bbotk]  2fda12ab-a0b1-4035-afd8-dba2e07b9589
#> INFO  [00:50:51.186] [bbotk] Evaluating 1 configuration(s)
#> INFO  [00:50:51.194] [mlr3] Running benchmark with 6 resampling iterations
#> INFO  [00:50:51.200] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 1/6)
#> INFO  [00:50:51.747] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 2/6)
#> INFO  [00:50:52.038] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 3/6)
#> INFO  [00:50:52.082] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 4/6)
#> INFO  [00:50:52.379] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 5/6)
#> INFO  [00:50:52.429] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 6/6)
#> INFO  [00:50:52.469] [mlr3] Finished benchmark
#> INFO  [00:50:52.505] [bbotk] Result of batch 3:
#> INFO  [00:50:52.508] [bbotk]   age diagtime karno prior   trt celltype.squamous celltype.smallcell
#> INFO  [00:50:52.508] [bbotk]  TRUE     TRUE  TRUE FALSE FALSE              TRUE               TRUE
#> INFO  [00:50:52.508] [bbotk]  celltype.adeno celltype.large surv.cindex warnings errors runtime_learners
#> INFO  [00:50:52.508] [bbotk]           FALSE          FALSE   0.7205039        0      0            0.943
#> INFO  [00:50:52.508] [bbotk]                                 uhash
#> INFO  [00:50:52.508] [bbotk]  061330b4-861a-4390-b3ae-af7ea26620d9
#> INFO  [00:50:52.585] [bbotk] Evaluating 1 configuration(s)
#> INFO  [00:50:52.593] [mlr3] Running benchmark with 6 resampling iterations
#> INFO  [00:50:52.599] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 1/6)
#> INFO  [00:50:52.641] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 2/6)
#> INFO  [00:50:52.694] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 3/6)
#> INFO  [00:50:52.744] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 4/6)
#> INFO  [00:50:52.785] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 5/6)
#> INFO  [00:50:52.828] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 6/6)
#> INFO  [00:50:52.868] [mlr3] Finished benchmark
#> INFO  [00:50:52.903] [bbotk] Result of batch 4:
#> INFO  [00:50:52.906] [bbotk]   age diagtime karno prior   trt celltype.squamous celltype.smallcell
#> INFO  [00:50:52.906] [bbotk]  TRUE     TRUE  TRUE FALSE FALSE             FALSE               TRUE
#> INFO  [00:50:52.906] [bbotk]  celltype.adeno celltype.large surv.cindex warnings errors runtime_learners
#> INFO  [00:50:52.906] [bbotk]           FALSE          FALSE   0.6968156        0      0            0.205
#> INFO  [00:50:52.906] [bbotk]                                 uhash
#> INFO  [00:50:52.906] [bbotk]  87c9810b-6109-4cd4-99af-b88bc91f4a6c
#> INFO  [00:50:52.987] [bbotk] Evaluating 1 configuration(s)
#> INFO  [00:50:52.995] [mlr3] Running benchmark with 6 resampling iterations
#> INFO  [00:50:53.001] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 1/6)
#> INFO  [00:50:53.287] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 2/6)
#> INFO  [00:50:53.330] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 3/6)
#> INFO  [00:50:53.642] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 4/6)
#> INFO  [00:50:53.687] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 5/6)
#> INFO  [00:50:53.921] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 6/6)
#> INFO  [00:50:53.980] [mlr3] Finished benchmark
#> INFO  [00:50:54.017] [bbotk] Result of batch 5:
#> INFO  [00:50:54.021] [bbotk]   age diagtime karno prior   trt celltype.squamous celltype.smallcell
#> INFO  [00:50:54.021] [bbotk]  TRUE     TRUE  TRUE FALSE FALSE             FALSE              FALSE
#> INFO  [00:50:54.021] [bbotk]  celltype.adeno celltype.large surv.cindex warnings errors runtime_learners
#> INFO  [00:50:54.021] [bbotk]           FALSE          FALSE   0.6935666        0      0            0.902
#> INFO  [00:50:54.021] [bbotk]                                 uhash
#> INFO  [00:50:54.021] [bbotk]  0226dae0-acae-4e9d-bd95-07bea64fd64f
#> INFO  [00:50:54.103] [bbotk] Evaluating 1 configuration(s)
#> INFO  [00:50:54.111] [mlr3] Running benchmark with 6 resampling iterations
#> INFO  [00:50:54.118] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 1/6)
#> INFO  [00:50:54.200] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 2/6)
#> INFO  [00:50:54.244] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 3/6)
#> INFO  [00:50:54.287] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 4/6)
#> INFO  [00:50:54.339] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 5/6)
#> INFO  [00:50:54.383] [mlr3] Applying learner 'surv.xgboost.cox' on task 'survival::veteran' (iter 6/6)
#> INFO  [00:50:54.424] [mlr3] Finished benchmark
#> INFO  [00:50:54.467] [bbotk] Result of batch 6:
#> INFO  [00:50:54.470] [bbotk]   age diagtime karno prior   trt celltype.squamous celltype.smallcell
#> INFO  [00:50:54.470] [bbotk]  TRUE    FALSE  TRUE FALSE FALSE             FALSE              FALSE
#> INFO  [00:50:54.470] [bbotk]  celltype.adeno celltype.large surv.cindex warnings errors runtime_learners
#> INFO  [00:50:54.470] [bbotk]           FALSE          FALSE   0.7012597        0      0            0.239
#> INFO  [00:50:54.470] [bbotk]                                 uhash
#> INFO  [00:50:54.470] [bbotk]  8c1472ee-f3c3-4776-be7b-548921dd3be4
#> INFO  [00:50:54.557] [bbotk] Finished optimizing after 6 evaluation(s)
#> INFO  [00:50:54.558] [bbotk] Result:
#> INFO  [00:50:54.561] [bbotk]     age diagtime  karno  prior    trt celltype.squamous celltype.smallcell
#> INFO  [00:50:54.561] [bbotk]  <lgcl>   <lgcl> <lgcl> <lgcl> <lgcl>            <lgcl>             <lgcl>
#> INFO  [00:50:54.561] [bbotk]    TRUE     TRUE   TRUE  FALSE  FALSE              TRUE               TRUE
#> INFO  [00:50:54.561] [bbotk]  celltype.adeno celltype.large                                   importance
#> INFO  [00:50:54.561] [bbotk]          <lgcl>         <lgcl>                                       <list>
#> INFO  [00:50:54.561] [bbotk]           FALSE          FALSE 5.000000,3.666667,2.833333,1.833333,1.666667
#> INFO  [00:50:54.561] [bbotk]                                                 features n_features surv.cindex
#> INFO  [00:50:54.561] [bbotk]                                                   <list>      <int>       <num>
#> INFO  [00:50:54.561] [bbotk]  age,diagtime,karno,celltype.squamous,celltype.smallcell          5   0.7205039
#>       age diagtime  karno  prior    trt celltype.squamous celltype.smallcell
#>    <lgcl>   <lgcl> <lgcl> <lgcl> <lgcl>            <lgcl>             <lgcl>
#> 1:   TRUE     TRUE   TRUE  FALSE  FALSE              TRUE               TRUE
#>    celltype.adeno celltype.large                                   importance
#>            <lgcl>         <lgcl>                                       <list>
#> 1:          FALSE          FALSE 5.000000,3.666667,2.833333,1.833333,1.666667
#>                                                   features n_features
#>                                                     <list>      <int>
#> 1: age,diagtime,karno,celltype.squamous,celltype.smallcell          5
#>    surv.cindex
#>          <num>
#> 1:   0.7205039

as.data.table(instance$archive)
#>       age diagtime  karno  prior    trt celltype.squamous celltype.smallcell
#>    <lgcl>   <lgcl> <lgcl> <lgcl> <lgcl>            <lgcl>             <lgcl>
#> 1:   TRUE     TRUE   TRUE   TRUE   TRUE              TRUE               TRUE
#> 2:   TRUE     TRUE   TRUE  FALSE   TRUE              TRUE               TRUE
#> 3:   TRUE     TRUE   TRUE  FALSE  FALSE              TRUE               TRUE
#> 4:   TRUE     TRUE   TRUE  FALSE  FALSE             FALSE               TRUE
#> 5:   TRUE     TRUE   TRUE  FALSE  FALSE             FALSE              FALSE
#> 6:   TRUE    FALSE   TRUE  FALSE  FALSE             FALSE              FALSE
#>    celltype.adeno celltype.large surv.cindex runtime_learners
#>            <lgcl>         <lgcl>       <num>            <num>
#> 1:           TRUE           TRUE   0.7046958            0.419
#> 2:          FALSE           TRUE   0.6971201            0.194
#> 3:          FALSE          FALSE   0.7205039            0.943
#> 4:          FALSE          FALSE   0.6968156            0.205
#> 5:          FALSE          FALSE   0.6935666            0.902
#> 6:          FALSE          FALSE   0.7012597            0.239
#>              timestamp batch_nr warnings errors
#>                 <POSc>    <int>    <int>  <int>
#> 1: 2024-05-16 00:50:50        1        0      0
#> 2: 2024-05-16 00:50:51        2        0      0
#> 3: 2024-05-16 00:50:52        3        0      0
#> 4: 2024-05-16 00:50:52        4        0      0
#> 5: 2024-05-16 00:50:54        5        0      0
#> 6: 2024-05-16 00:50:54        6        0      0
#>                                                   importance
#>                                                       <list>
#> 1: 9.000000,7.666667,6.166667,5.333333,4.833333,4.166667,...
#> 2: 7.000000,5.666667,4.166667,3.833333,2.833333,2.833333,...
#> 3:              5.000000,3.666667,2.833333,1.833333,1.666667
#> 4:                       4.000000,2.666667,2.000000,1.333333
#> 5:                                                     3,2,1
#> 6:                                                       2,1
#>                                                           features n_features
#>                                                             <list>     <list>
#> 1:              age,diagtime,karno,prior,trt,celltype.squamous,...          9
#> 2: age,diagtime,karno,trt,celltype.squamous,celltype.smallcell,...          7
#> 3:         age,diagtime,karno,celltype.squamous,celltype.smallcell          5
#> 4:                           age,diagtime,karno,celltype.smallcell          4
#> 5:                                              age,diagtime,karno          3
#> 6:                                                       age,karno          2
#>     resample_result
#>              <list>
#> 1: <ResampleResult>
#> 2: <ResampleResult>
#> 3: <ResampleResult>
#> 4: <ResampleResult>
#> 5: <ResampleResult>
#> 6: <ResampleResult>
# I also want to subset the task to the optimal feature set 
# and again train the learner.

task$select(instance$result_feature_set)
learner$train(task, part$train)

# And finally, I use the trained model to predict the survival time on test data,
# which will encounter an error.

p = learner$predict(task, part$test)
p
#> <PredictionSurv> for 28 observations:
#>     row_ids time status        crank           lp     distr
#>           1   72   TRUE -0.458429651 -0.458429651 <list[1]>
#>           7   82   TRUE -0.255330144 -0.255330144 <list[1]>
#>          13  144   TRUE  1.800571186  1.800571186 <list[1]>
#> ---                                                        
#>         135  231   TRUE -0.644105278 -0.644105278 <list[1]>
#>          21  123  FALSE  1.800571186  1.800571186 <list[1]>
#>          22   97  FALSE  0.005314152  0.005314152 <list[1]>

^{Created on 2024-05-16 with reprex v2.0.2}

bblodfon commented 6 months ago

Also, what you are doing might not be optimal, since you first tune the learner using all features (so hyperparameters get fixed) and then you do RFE with the same hyperparameters for each different subset, whereas it might be the case that each different subset might need different optimal hyperparameters. If you wanna do this, which means way heavier computation as well, it would mean nested resampling, see this post.

fa1999abdi commented 6 months ago

tanks john

also, I asked in issue #363 about getting the survival time prediction, but it doesn't seem like the response column is the real survival time prediction,

gffde

Why are survival times so close for individuals with different times and statuses? (Why row_ids 2 and 3 have exactly the same response with different times and statuses? What does that mean?)

bblodfon commented 6 months ago

It's the cumulative hazard (method) which as you say doesn't correspond to a true survival time. In this case this happens because its a crank composition and we added the response as an extra (but didn't think too much about the actual values). The cum_haz uses the distr prediction and the computation is very simple, see ?survivalmodels::surv_to_risk(). Also done in RSFs. In your case the two distr must be the same to get the same response.

In general, I am aware of this issue and wanted to change the default response in the composition to be the RMST (restricted mean survival time) which a very nice method for expected survival time, backed by literature. Will ping you when it is done in the Discussion.

fa1999abdi commented 6 months ago

With these explanations, How can I access (predictions$survival_probabilities) to Plotting survival curves for test set?

bblodfon commented 6 months ago

We have some based on matplot (will update to have the on ggplot soon), but basically you can get the p$data$distr matrix or p$distr$survival(times_points_of_your_interest) where p is the PredictionSurv object and plot them yourself, see example

fa1999abdi commented 6 months ago

Thanks so much for your help.

mlr-org / mlr3proba

ML study #381