mlr-org / mlr3proba

Probabilistic Learning for mlr3
https://mlr3proba.mlr-org.com
GNU Lesser General Public License v3.0
126 stars 20 forks source link

parallization issues with coxph #279

Closed lenaeb closed 9 months ago

lenaeb commented 2 years ago

Hi all,

we want to run a nested crossvalidaiton with surv.coxph. Therefore we want to parallize the outer loop (using future).

Unfortunately the parallization does not seem to work, we need the same cpu time for different numbers of workers. E.g for workers = 4 we need 4.424515 mins and or 2 workers we need 2.448493 mins. Using nested crossvalidation for a classifcation task we could nicely see the decrease in time (by increasing the workers).

Below you can find a example:

library(mlr3)
library(mlr3fselect)
library(mlr3learners)
library(dplyr)
library(mlr3proba)
task = TaskSurv$new(survival::rats %>% select(-sex), id = "right_censored",
                    time = "time", event = "status", type = "right")

# Learner
learner_auto <- AutoFSelector$new(
  learner = lrn("surv.coxph"),
  resampling = rsmp("cv",folds = 3),
  measure = msr( 'surv.cindex'),
  store_models = F, store_benchmark_result = F, store_fselect_instance = T,
  terminator = trm("evals", n_evals = 100000),
  fselector = fs("exhaustive_search", max_features = 1)
)

# outer loop
outer_rsmp <- rsmp("subsampling", repeats = 16, ratio = 0.7)
outer_rsmp$instantiate(task)

## benchmark grid
design <- benchmark_grid(
  learners = learner_auto,
  tasks = task,
  resamplings = outer_rsmp
)

# parallelization
future::plan(list(future::tweak("multisession", **workers = 4)**, 'sequential') )

start = Sys.time()
bench_res <- benchmark(design,store_models = T, store_backends = F) 
end = Sys.time()
print(end-start)

# parallelization
future::plan(list(future::tweak("multisession", **workers = 2**), 'sequential') )

start = Sys.time()
bench_res <- benchmark(design,store_models = T, store_backends = F) 
end = Sys.time()
print(end-start)

Best regards, Lena

RaphaelS1 commented 2 years ago

Hey sorry I can't get your example working, can you please use reprex::reprex ?

lenaeb commented 2 years ago

Hi Raphael,

sorry about that. I hope it works now, if not, please let me know!

Best, Lena

library(mlr3)
library(mlr3fselect)
library(mlr3learners)
library(dplyr)
library(mlr3proba)
task = TaskSurv$new(survival::rats %>% select(-sex), id = "right_censored",
                    time = "time", event = "status", type = "right")

# Learner
learner_auto <- AutoFSelector$new(
  learner = lrn("surv.coxph"),
  resampling = rsmp("cv",folds = 3),
  measure = msr( 'surv.cindex'),
  store_models = F, store_benchmark_result = F, store_fselect_instance = T,
  terminator = trm("evals", n_evals = 100000),
  fselector = fs("exhaustive_search", max_features = 1)
)

# outer loop
outer_rsmp <- rsmp("subsampling", repeats = 16, ratio = 0.7)
outer_rsmp$instantiate(task)

## benchmark grid
design <- benchmark_grid(
  learners = learner_auto,
  tasks = task,
  resamplings = outer_rsmp
)

# parallelization
future::plan(list(future::tweak("multisession", workers = 4), 'sequential') )

start = Sys.time()
bench_res <- benchmark(design,store_models = T, store_backends = F) 
end = Sys.time()
print(end-start)
#> Time difference of 2.697443 mins

# parallelization
future::plan(list(future::tweak("multisession", workers = 2), 'sequential') )

start = Sys.time()
bench_res <- benchmark(design,store_models = T, store_backends = F) 
end = Sys.time()
print(end-start)
#> Time difference of 1.315648 mins

Created on 2022-06-01 by the reprex package (v2.0.0)

RaphaelS1 commented 2 years ago

Thanks, I removed the uninformative INFO. So to me it looks strange because workers = 2 actually takes half the tim as workers = 4. I haven't done any work with parallelisation in mlr3proba TBH, so maybe another member of the team who works more with future can help. Maybe @be-marc ?

be-marc commented 2 years ago

Doubling the time with more cores is very strange. Two ideas

1) Do you run into memory issues with 4 workers? Using the swap would slow down the calculations. 2) How fast is the sequential execution? Future needs to serialize and deserialize the objects which are passed between the main process and the workers. When serialization is a major overhead, 4 workers might be slower than 2.

lenaeb commented 2 years ago

I thinks its 2). You can see below that without parallization it takes 12 sec, with 2 workers 2 min, with 4 workers 3 minutes, and with 8 workers 6 min.

library(mlr3)
library(mlr3fselect)
library(mlr3learners)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(mlr3proba)
task = TaskSurv$new(survival::rats %>% select(-sex), id = "right_censored",
                    time = "time", event = "status", type = "right")

# Learner
learner_auto <- AutoFSelector$new(
  learner = lrn("surv.coxph"),
  resampling = rsmp("cv",folds = 3),
  measure = msr( 'surv.cindex'),
  store_models = F, store_benchmark_result = F, store_fselect_instance = T,
  terminator = trm("evals", n_evals = 100000),
  fselector = fs("exhaustive_search", max_features = 1)
)

# outer loop
outer_rsmp <- rsmp("subsampling", repeats = 16, ratio = 0.7)
outer_rsmp$instantiate(task)

## benchmark grid
design <- benchmark_grid(
  learners = learner_auto,
  tasks = task,
  resamplings = outer_rsmp
)

lgr::get_logger("bbotk")$set_threshold("warn")
lgr::get_logger("mlr3")$set_threshold("warn")

start = Sys.time()
bench_res <- benchmark(design,store_models = T, store_backends = F) 
end = Sys.time()
print(end-start)
#> Time difference of 12.27735 secs

# parallelization
future::plan(list(future::tweak("multisession", workers = 2), 'sequential') )

start = Sys.time()
bench_res <- benchmark(design,store_models = T, store_backends = F) 
end = Sys.time()
print(end-start)
#> Time difference of 2.162249 mins

# parallelization
future::plan(list(future::tweak("multisession", workers = 4), 'sequential') )

start = Sys.time()
bench_res <- benchmark(design,store_models = T, store_backends = F) 
end = Sys.time()
print(end-start)
#> Time difference of 3.286482 mins

# parallelization
future::plan(list(future::tweak("multisession", workers = 8), 'sequential') )

start = Sys.time()
bench_res <- benchmark(design,store_models = T, store_backends = F) 
end = Sys.time()
print(end-start)
#> Time difference of 6.048725 mins

Created on 2022-06-01 by the reprex package (v2.0.0)

be-marc commented 2 years ago

Thanks for testing. I will investigate this further.

be-marc commented 2 years ago

I will post a few tests.

Learner serialization.

library(mlr3proba)
library(tictoc)

task = TaskSurv$new(survival::rats, id = "right_censored", time = "time", event = "status", type = "right")
task$select(setdiff(task$feature_names, "sex"))

learner = lrn("surv.coxph")
learner$train(task)

# serialize
tic()
x = serialize(learner, connection = NULL)
toc()
# 0.011 seconds
be-marc commented 2 years ago

AutoFSelector with random search and full batch.

afs = AutoFSelector$new(
  learner = lrn("surv.coxph"),
  resampling = rsmp("cv",folds = 3),
  measure = msr( 'surv.cindex'),
  store_models = FALSE, 
  store_benchmark_result = FALSE,
  store_fselect_instance = TRUE,
  terminator = trm("evals", n_evals = 100),
  fselector = fs("random_search", batch_size = 100)
)

## sequential
future::plan("sequential")

tic()
afs$train(task)
toc()
# 35 seconds

## multisession
future::plan("multisession", workers = 4)

tic()
afs$train(task)
toc()
# 20 seconds
be-marc commented 2 years ago

Nested resampling.

resampling = rsmp("cv", folds = 3)

## sequential
future::plan("sequential")

tic()
rr = resample(task, afs, resampling)
toc()
# 112 seconds

## multisession
future::plan(list(future::tweak("multisession", workers = 2), "sequential"))

tic()
rr = resample(task, afs, resampling)
toc()
# 64 seconds
be-marc commented 2 years ago

AutoFSelector with exhaustive search.

afs = AutoFSelector$new(
  learner = lrn("surv.coxph"),
  resampling = rsmp("cv",folds = 3),
  measure = msr( 'surv.cindex'),
  store_models = FALSE, 
  store_benchmark_result = FALSE,
  store_fselect_instance = TRUE,
  terminator = trm("none"),
  fselector = fs("exhaustive_search", max_features = 1)
)

## sequential
future::plan("sequential")

tic()
rr = resample(task, afs, resampling)
toc()
# 1.5 seconds

## multisession
future::plan(list(future::tweak("multisession", workers = 2), "sequential"))

tic()
rr = resample(task, afs, resampling)
toc()
# 4.4 seconds
be-marc commented 2 years ago

AutoFSelector with more outer iterations

resampling = rsmp("repeated_cv", folds = 3, repeats = 10)

afs = AutoFSelector$new(
  learner = lrn("surv.coxph"),
  resampling = rsmp("cv",folds = 3),
  measure = msr( 'surv.cindex'),
  store_models = FALSE, 
  store_benchmark_result = FALSE,
  store_fselect_instance = TRUE,
  terminator = trm("none"),
  fselector = fs("exhaustive_search", max_features = 1)
)

## sequential
future::plan("sequential")

tic()
rr = resample(task, afs, resampling)
toc()
# 26 seconds

## multisession
future::plan(list(future::tweak("multisession", workers = 2), "sequential"))

tic()
rr = resample(task, afs, resampling)
toc()
# 14 seconds
be-marc commented 2 years ago

AutoFSelector with subsampling.

resampling = rsmp("subsampling", repeats = 16, ratio = 0.7)

afs = AutoFSelector$new(
  learner = lrn("surv.coxph"),
  resampling = rsmp("cv",folds = 3),
  measure = msr( 'surv.cindex'),
  store_models = FALSE, 
  store_benchmark_result = FALSE,
  store_fselect_instance = TRUE,
  terminator = trm("none"),
  fselector = fs("exhaustive_search", max_features = 1)
)

## sequential
future::plan("sequential")

tic()
rr = resample(task, afs, resampling)
toc()
# 13 seconds

## multisession
future::plan(list(future::tweak("multisession", workers = 2), "sequential"))

tic()
rr = resample(task, afs, resampling)
toc()
# 8 seconds
be-marc commented 2 years ago

@lenaeb I cannot reproduce the issue. Can you run the following test on your machine and confirm?

library(mlr3proba)
library(mlr3fselect)
library(tictoc)

task = TaskSurv$new(survival::rats, id = "right_censored", time = "time", event = "status", type = "right")
task$select(setdiff(task$feature_names, "sex"))

afs = AutoFSelector$new(
  learner = lrn("surv.coxph"),
  resampling = rsmp("cv", folds = 3),
  measure = msr("surv.cindex"),
  store_models = FALSE, 
  store_benchmark_result = FALSE,
  store_fselect_instance = TRUE,
  terminator = trm("none"),
  fselector = fs("exhaustive_search", max_features = 1)
)

# outer loop
resampling = rsmp("subsampling", repeats = 16, ratio = 0.7)
resampling$instantiate(task)

## benchmark grid
design = benchmark_grid(
  learners = afs,
  tasks = task,
  resamplings = resampling
)

## sequential
future::plan("sequential")

tic()
bmr = benchmark(design,store_models = TRUE, store_backends = FALSE) 
toc()
# 15 seconds

## multisession
future::plan(list(future::tweak("multisession", workers = 2), "sequential"))

tic()
bmr = benchmark(design, store_models = TRUE, store_backends = FALSE) 
toc()
# 8 seconds
lenaeb commented 2 years ago

The problem is still remaing.. here you can find the output when we run the script on the cluster:

library(mlr3proba)
#> Loading required package: mlr3
library(mlr3fselect)
library(tictoc)

task = TaskSurv$new(survival::rats, id = "right_censored", time = "time", event = "status", type = "right")
task$select(setdiff(task$feature_names, "sex"))

afs = AutoFSelector$new(
  learner = lrn("surv.coxph"),
  resampling = rsmp("cv", folds = 3),
  measure = msr("surv.cindex"),
  store_models = FALSE, 
  store_benchmark_result = FALSE,
  store_fselect_instance = TRUE,
  terminator = trm("none"),
  fselector = fs("exhaustive_search", max_features = 1)
)

# outer loop
resampling = rsmp("subsampling", repeats = 16, ratio = 0.7)
resampling$instantiate(task)

## benchmark grid
design = benchmark_grid(
  learners = afs,
  tasks = task,
  resamplings = resampling
)

lgr::get_logger("bbotk")$set_threshold("warn")
lgr::get_logger("mlr3")$set_threshold("warn")

## sequential
future::plan("sequential")

tic()
bmr = benchmark(design,store_models = TRUE, store_backends = FALSE) 
toc()
#> 14.753 sec elapsed

## multisession
future::plan(list(future::tweak("multisession", workers = 2), "sequential"))

tic()
bmr = benchmark(design, store_models = TRUE, store_backends = FALSE) 
toc()
#> 142.167 sec elapsed

Created on 2022-06-01 by the reprex package (v2.0.0)

Here on the local machine, the runtime is more or less the same...

library(mlr3proba)
#> Lade nötiges Paket: mlr3
library(mlr3fselect)
library(tictoc)

task = TaskSurv$new(survival::rats, id = "right_censored", time = "time", event = "status", type = "right")
task$select(setdiff(task$feature_names, "sex"))

afs = AutoFSelector$new(
  learner = lrn("surv.coxph"),
  resampling = rsmp("cv", folds = 3),
  measure = msr("surv.cindex"),
  store_models = FALSE, 
  store_benchmark_result = FALSE,
  store_fselect_instance = TRUE,
  terminator = trm("none"),
  fselector = fs("exhaustive_search", max_features = 1)
)

# outer loop
resampling = rsmp("subsampling", repeats = 16, ratio = 0.7)
resampling$instantiate(task)

## benchmark grid
design = benchmark_grid(
  learners = afs,
  tasks = task,
  resamplings = resampling
)

lgr::get_logger("bbotk")$set_threshold("warn")
lgr::get_logger("mlr3")$set_threshold("warn")

## sequential
future::plan("sequential")

tic()
bmr = benchmark(design,store_models = TRUE, store_backends = FALSE) 
toc()
#> 17.09 sec elapsed

## multisession
future::plan(list(future::tweak("multisession", workers = 2), "sequential"))

tic()
bmr = benchmark(design, store_models = TRUE, store_backends = FALSE) 
toc()
#> 14.58 sec elapsed

Created on 2022-06-01 by the reprex package (v2.0.1)

be-marc commented 2 years ago

More or less the same is okay. fs("exhaustive_search", max_features = 1) produces only 2 combinations and therefore finishes very quickly. Parallelization introduces too much overhead in this case.

It gets complicated when it only happens on the cluster. Do you have a dedicated machine there? And do you run the script with future::plan(list(future::tweak("multisession", workers = 2), "sequential")) on the cluster? @RaphaelS1 Can you run my last example on your local machine?

lenaeb commented 2 years ago

Yes we have a dedicated machine there. Yes we use this code snip to run it on the cluster.

be-marc commented 2 years ago

Please try to update all packages and run the script again. Post your sessionInfo(). One package might be old and produces large objects which are slowly serialized.

lenaeb commented 2 years ago

Only for mlr3learner ( --> 0.5.3) there was an update avaible, unfortunately it does not help.

So i updated the mlr3, mlr3fselect, mlr3proba, package to the dev version via remotes::install_github("mlr-org/**")

Here is the output, still the same..

library(mlr3proba)
#> Loading required package: mlr3
library(mlr3fselect)
library(tictoc)

task = TaskSurv$new(survival::rats, id = "right_censored", time = "time", event = "status", type = "right")
task$select(setdiff(task$feature_names, "sex"))

afs = AutoFSelector$new(
  learner = lrn("surv.coxph"),
  resampling = rsmp("cv", folds = 3),
  measure = msr("surv.cindex"),
  store_models = FALSE, 
  store_benchmark_result = FALSE,
  store_fselect_instance = TRUE,
  terminator = trm("none"),
  fselector = fs("exhaustive_search", max_features = 1)
)

# outer loop
resampling = rsmp("subsampling", repeats = 16, ratio = 0.7)
resampling$instantiate(task)

## benchmark grid
design = benchmark_grid(
  learners = afs,
  tasks = task,
  resamplings = resampling
)

lgr::get_logger("bbotk")$set_threshold("warn")
lgr::get_logger("mlr3")$set_threshold("warn")

## sequential
future::plan("sequential")

tic()
bmr = benchmark(design,store_models = TRUE, store_backends = FALSE) 
toc()
#> 14.652 sec elapsed

## multisession
future::plan(list(future::tweak("multisession", workers = 2), "sequential"))

tic()
bmr = benchmark(design, store_models = TRUE, store_backends = FALSE) 
toc()
#> 133.025 sec elapsed

sessionInfo()
#> R version 4.0.5 (2021-03-31)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Red Hat Enterprise Linux 8.2 (Ootpa)
#> 
#> Matrix products: default
#> BLAS/LAPACK: /apps/rocs/2020.08/cascadelake/software/OpenBLAS/0.3.9-GCC-9.3.0/lib/libopenblas_skylakexp-r0.3.9.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] tictoc_1.0.1      mlr3fselect_0.7.1 mlr3proba_0.4.12  mlr3_0.13.3-9000 
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.8.3         compiler_4.0.5       highr_0.9           
#>  [4] mlr3misc_0.10.0      tools_4.0.5          digest_0.6.29       
#>  [7] uuid_1.1-0           lattice_0.20-44      evaluate_0.15       
#> [10] checkmate_2.1.0      dictionar6_0.1.3     rlang_1.0.2         
#> [13] Matrix_1.3-3         reprex_2.0.0         cli_3.3.0           
#> [16] rstudioapi_0.13      bbotk_0.5.3          yaml_2.2.1          
#> [19] parallel_4.0.5       xfun_0.22            withr_2.5.0         
#> [22] stringr_1.4.0        knitr_1.33           fs_1.5.0            
#> [25] globals_0.15.0       grid_4.0.5           glue_1.6.2          
#> [28] data.table_1.14.2    listenv_0.8.0        param6_0.2.4        
#> [31] distr6_1.6.10        R6_2.5.1             future.apply_1.9.0  
#> [34] parallelly_1.31.1    survival_3.2-11      rmarkdown_2.11      
#> [37] lgr_0.4.3            magrittr_2.0.3       mlr3pipelines_0.4.1 
#> [40] splines_4.0.5        backports_1.4.1      codetools_0.2-18    
#> [43] htmltools_0.5.1.1    palmerpenguins_0.1.0 future_1.26.1       
#> [46] set6_0.2.4           paradox_0.9.0        ooplah_0.2.0        
#> [49] stringi_1.6.1        crayon_1.5.1

Created on 2022-06-07 by the reprex package (v2.0.0)

bblodfon commented 9 months ago

Hi @lenaeb, this seems more like a techincal issue rather a problem with the mlr3proba. Out of curiosity, did you maybe got it to work the way you wanted in the end? Stackoverflow might be a better place to address this sort of questions :)

bblodfon commented 9 months ago

Closed due to inactivity