tidymodels / tune

Tools for tidy parameter tuning
https://tune.tidymodels.org
Other
282 stars 42 forks source link

delayed logging when using parallel processing with future #927

Closed mihem closed 3 months ago

mihem commented 3 months ago

The problem

When I use doMC::registerDoMC(cores = 6) logging works fine, but since doMC is supposed to be deprecated soon, I switched to future. However, here logging is delayed (i.e. it does not show up till the process is finished, which can take some time for complex model)

What I mean by logging is:

i Fold01: preprocessor 1/1
✓ Fold01: preprocessor 1/1
i Fold01: preprocessor 1/1, model 1/1
✓ Fold01: preprocessor 1/1, model 1/1
i Fold01: preprocessor 1/1, model 1/1 (extracts)
i Fold01: preprocessor 1/1, model 1/1 (predictions)

@simonpcouch maybe?

Reproducible example

library(tidymodels)
library(tune)

set.seed(1234)
data_folds <- vfold_cv(mtcars, v = 10, repeats = 1)

data_recipe <-
  recipe(mpg ~ ., data = mtcars)  |>
  step_normalize(all_predictors())

lm_model <-
  linear_reg(mode = "regression", penalty = tune(), mixture = 1) |>
  set_engine("glmnet")

lm_workflow <-
  workflow() |>
  add_model(lm_model) |>
  add_recipe(data_recipe)

doMC::registerDoMC(cores = 6)

# now logging works fine
cars_res <-
    lm_workflow |>
    tune_grid(
        resamples = data_folds,
        grid = 10,
        control = control_grid(verbose = TRUE)
    )

future::plan(future::multisession, workers = 6)

# now logging does not show up till the process is finished
cars_res <-
    lm_workflow |>
    tune_grid(
        resamples = data_folds,
        grid = 10,
        control = control_grid(verbose = TRUE)
    )

Created on 2024-07-28 with reprex v2.1.0

Session info ``` r sessionInfo() #> R version 4.3.1 (2023-06-16) #> Platform: x86_64-pc-linux-gnu (64-bit) #> Running under: Ubuntu 22.04.4 LTS #> #> Matrix products: default #> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 #> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0 #> #> locale: #> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C #> [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=en_US.UTF-8 #> [5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=en_US.UTF-8 #> [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C #> [9] LC_ADDRESS=C LC_TELEPHONE=C #> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C #> #> time zone: Europe/Berlin #> tzcode source: system (glibc) #> #> attached base packages: #> [1] stats graphics grDevices utils datasets methods base #> #> other attached packages: #> [1] glmnet_4.1-8 Matrix_1.6-5 yardstick_1.3.1 workflowsets_1.1.0 #> [5] workflows_1.1.4 tune_1.2.1 tidyr_1.3.1 tibble_3.2.1 #> [9] rsample_1.2.1 recipes_1.0.10 purrr_1.0.2 parsnip_1.2.1 #> [13] modeldata_1.3.0 infer_1.0.7 ggplot2_3.5.1 dplyr_1.1.4 #> [17] dials_1.2.1 scales_1.3.0 broom_1.0.5 tidymodels_1.2.0 #> #> loaded via a namespace (and not attached): #> [1] tidyselect_1.2.1 timeDate_4032.109 R.utils_2.12.3 #> [4] fastmap_1.1.1 reprex_2.1.0 digest_0.6.35 #> [7] rpart_4.1.23 timechange_0.3.0 lifecycle_1.0.4 #> [10] survival_3.5-8 magrittr_2.0.3 compiler_4.3.1 #> [13] doMC_1.3.8 rlang_1.1.3 tools_4.3.1 #> [16] utf8_1.2.4 yaml_2.3.8 data.table_1.15.4 #> [19] knitr_1.46 DiceDesign_1.10 R.cache_0.16.0 #> [22] withr_3.0.0 R.oo_1.26.0 nnet_7.3-19 #> [25] grid_4.3.1 fansi_1.0.6 colorspace_2.1-0 #> [28] future_1.33.2 globals_0.16.3 iterators_1.0.14 #> [31] MASS_7.3-60.0.1 cli_3.6.2 rmarkdown_2.26 #> [34] generics_0.1.3 rstudioapi_0.16.0 future.apply_1.11.2 #> [37] splines_4.3.1 parallel_4.3.1 vctrs_0.6.5 #> [40] hardhat_1.3.1 listenv_0.9.1 foreach_1.5.2 #> [43] clipr_0.8.0 gower_1.0.1 glue_1.7.0 #> [46] parallelly_1.37.1 codetools_0.2-20 shape_1.4.6.1 #> [49] lubridate_1.9.3 gtable_0.3.5 munsell_0.5.1 #> [52] GPfit_1.0-8 styler_1.10.3 pillar_1.9.0 #> [55] furrr_0.3.1 htmltools_0.5.8.1 ipred_0.9-14 #> [58] lava_1.8.0 R6_2.5.1 lhs_1.1.6 #> [61] evaluate_0.23 lattice_0.22-6 R.methodsS3_1.8.2 #> [64] backports_1.4.1 class_7.3-22 Rcpp_1.0.12 #> [67] prodlim_2023.08.28 xfun_0.43 fs_1.6.3 #> [70] pkgconfig_2.0.3 ```
simonpcouch commented 3 months ago

Thanks for the thorough issue description, @mihem! I happen to see no logging at all when using doMC, but that's maybe beside the point---

From the control_grid() documentation:

verbose: A logical for logging results (other than warnings and errors, which are always shown) as they are generated during training in a single R process. When using most parallel backends, this argument typically will not result in any logging.

(Emphasis my own.) It's a nice artifact that tune is able to print logs with doMC forking in your case, though is on "off-label" feature. For now, this behaves as documented and will stay this way until we further some deprecations.

That said, once we've fully transitioned to future from foreach, we'll be able to come back to this and have the same logging output for both sequential and parallel tuning. :)

mihem commented 3 months ago

Thanks for the swift response.

Okay because I'll read that you said that future has better logging but right now doMC has the better logging. So I will continue to use doMC because it's an important feature for me. So would be nice if doMC kept working until real time logging works with future.

Thanks.

github-actions[bot] commented 3 months ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.