tidymodels / tune

Tools for tidy parameter tuning
https://tune.tidymodels.org
Other
271 stars 42 forks source link

Improve or document how to export functions used in recipes, custom models to parallel workers #364

Closed smingerson closed 10 months ago

smingerson commented 3 years ago

I've been running into problems with necessary objects not being brought along when tuning workflows. This crops up in two places. Below is an example with a function defined in a script and used in step_mutate(). It works fitting locally, but not in parallel. If I use parallel::clusterExport(cl, c("add1")) prior to calling tune::tune_grid(), it works, but this would get rather clunky with a real problem.

Another place (I think) this crops up is a custom parsnip model not defined in a package. It does not appear to propagate the settings from parsnip::set_*() functions. I think to fix this I would have to call parallel::clusterEvalQ() with the parsnip::set_*() calls in the expression.

I think some documentation pointing to these workarounds would be great. I'd be happy to create a PR to this effect if desired. Alternatively, control_grid() could contain a prework argument, similar to drake::drake_config()'s prework argument, to register an expression to be run through clusterEvalQ(), or perhaps via a tune::cluster_setup() function, so it is not re-run when using the same cluster across multiple tune_*() calls. I'd be happy to attempt a PR here too.

cl <- parallel::makePSOCKcluster(2)
doParallel::registerDoParallel(cl)
library(recipes)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#> 
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#> 
#>     step
library(tune)
library(rsample)
library(parsnip)
library(workflows)
add1 <- function(x) x + 1

# If I ran `parallel::clusterExport(cl, c("add1"))`
# the parallel workflow would work.

rsamp <- rsample::bootstraps(mtcars)
rec <- recipe(mtcars) %>% 
  step_mutate(hp = add1(hp)) %>% 
  update_role(mpg, new_role = "outcome") %>% 
  update_role(hp, disp, wt, new_role = "predictor") 
mod <- linear_reg("regression", penalty = tune()) %>% set_engine("glmnet")

wf <- workflows::workflow() %>% 
  add_recipe(rec) %>% 
  add_model(mod) 
# works fine when fitting locally
fit(wf, data = mtcars)
#> == Workflow [trained] ==========================================================
#> Preprocessor: Recipe
#> Model: linear_reg()
#> 
#> -- Preprocessor ----------------------------------------------------------------
#> 1 Recipe Step
#> 
#> * step_mutate()
#> 
#> -- Model -----------------------------------------------------------------------
#> 
#> Call:  glmnet::glmnet(x = maybe_matrix(x), y = y, family = "gaussian") 
#> 
#>    Df  %Dev Lambda
#> 1   0  0.00 5.1470
#> 2   1 12.78 4.6900
#> 3   1 23.39 4.2730
#> 4   2 32.47 3.8940
#> 5   2 40.22 3.5480
#> 6   3 46.85 3.2320
#> 7   3 52.93 2.9450
#> 8   3 57.98 2.6840
#> 9   3 62.18 2.4450
#> 10  3 65.66 2.2280
#> 11  3 68.55 2.0300
#> 12  3 70.95 1.8500
#> 13  3 72.94 1.6850
#> 14  3 74.60 1.5360
#> 15  3 75.97 1.3990
#> 16  3 77.11 1.2750
#> 17  3 78.05 1.1620
#> 18  3 78.84 1.0580
#> 19  3 79.49 0.9645
#> 20  3 80.03 0.8788
#> 21  3 80.48 0.8007
#> 22  3 80.86 0.7296
#> 23  3 81.17 0.6648
#> 24  3 81.43 0.6057
#> 25  3 81.64 0.5519
#> 26  3 81.82 0.5029
#> 27  3 81.96 0.4582
#> 28  3 82.09 0.4175
#> 29  3 82.19 0.3804
#> 30  3 82.27 0.3466
#> 31  3 82.34 0.3158
#> 32  3 82.40 0.2878
#> 33  3 82.45 0.2622
#> 34  3 82.49 0.2389
#> 35  3 82.52 0.2177
#> 36  3 82.55 0.1983
#> 37  3 82.57 0.1807
#> 38  3 82.59 0.1647
#> 39  3 82.61 0.1500
#> 40  3 82.62 0.1367
#> 41  3 82.63 0.1246
#> 42  3 82.64 0.1135
#> 43  3 82.65 0.1034
#> 44  3 82.65 0.0942
#> 45  3 82.66 0.0859
#> 46  3 82.66 0.0782
#> 
#> ...
#> and 9 more lines.
# all error
tune::tune_grid(wf, rsamp, grid = 20)
# this isn't what I get when running in a clean session outside of reprex...
#> Error in checkForRemoteErrors(lapply(cl, recvResult)): 2 nodes produced errors; first error: object '.doSnowGlobals' not found

Instead of that last error, I see the following when running in a clean R session. Maybe some weird interaction with how reprex::reprex() is called and parallelization?

# Warning: All models failed. See the `.notes` column.
# Warning: This tuning result has notes. Example notes on model fitting include:
#   preprocessor 1/1: Error: Problem with `mutate()` input `hp`.
# x no terms in scope
# i Input `hp` is `add1(hp)`.
# preprocessor 1/1: Error: Problem with `mutate()` input `hp`.
# x no terms in scope
# i Input `hp` is `add1(hp)`.
# preprocessor 1/1: Error: Problem with `mutate()` input `hp`.
# x no terms in scope
# i Input `hp` is `add1(hp)`.
juliasilge commented 3 years ago

Thanks for this feedback! 🙌 I think you are right that we need to document this more clearly somewhere.

My initial thought for the best fit would be in the "Optimizations and Parallel Processing" vignette that folks find here.

smingerson commented 3 years ago

I'm no longer sure if this issue is real, I cannot reproduce on another computer (both Windows 10, R 4.0.3). I'll see if I can figure out what's different.

zenggyu commented 3 years ago

I can reproduce it on Ubuntu 20.04. reprex below:

library(doParallel)
#> Loading required package: foreach
#> Loading required package: iterators
#> Loading required package: parallel
library(tidymodels)
#> ── Attaching packages ────────────────────────────────────── tidymodels 0.1.3 ──
#> ✓ broom        0.7.6      ✓ recipes      0.1.16
#> ✓ dials        0.0.9      ✓ rsample      0.0.9 
#> ✓ dplyr        1.0.5      ✓ tibble       3.1.1 
#> ✓ ggplot2      3.3.3      ✓ tidyr        1.1.3 
#> ✓ infer        0.5.4      ✓ tune         0.1.5 
#> ✓ modeldata    0.1.0      ✓ workflows    0.2.2 
#> ✓ parsnip      0.1.5      ✓ workflowsets 0.0.2 
#> ✓ purrr        0.3.4      ✓ yardstick    0.0.8
#> ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
#> x purrr::accumulate() masks foreach::accumulate()
#> x purrr::discard()    masks scales::discard()
#> x dplyr::filter()     masks stats::filter()
#> x dplyr::lag()        masks stats::lag()
#> x recipes::step()     masks stats::step()
#> x purrr::when()       masks foreach::when()
#> • Use tidymodels_prefer() to resolve common conflicts.

data(mtcars)

square_root <- function(x) {
  sqrt(x)
}

lm_spec <- decision_tree() %>%
  set_mode("regression") %>%
  set_engine("rpart") %>%
  set_args(cost_complexity = tune())

resamples <- vfold_cv(mtcars, v = 2)

preprocessor <- mtcars %>%
  recipe(mpg ~ .) %>%
  step_mutate(hp = square_root(hp))

grid_params <- grid_regular(
  cost_complexity(),
  levels = 2
)

cl <- makePSOCKcluster(2)
registerDoParallel(cl)

# parallel::clusterExport(cl, c("square_root")) # uncomment this line to make it work

result <- tune_grid(
  lm_spec,
  preprocessor = preprocessor,
  resamples = resamples,
  grid = grid_params,
  metrics = metric_set(rmse)
)
#> Warning: All models failed. See the `.notes` column.

stopCluster(cl)

result
#> Warning: This tuning result has notes. Example notes on model fitting include:
#> preprocessor 1/1: Error: Problem with `mutate()` input `hp`.
#> x could not find function "square_root"
#> ℹ Input `hp` is `square_root(hp)`.
#> preprocessor 1/1: Error: Problem with `mutate()` input `hp`.
#> x could not find function "square_root"
#> ℹ Input `hp` is `square_root(hp)`.
#> # Tuning results
#> # 2-fold cross-validation 
#> # A tibble: 2 x 4
#>   splits          id    .metrics .notes          
#>   <list>          <chr> <list>   <list>          
#> 1 <split [16/16]> Fold1 <NULL>   <tibble [1 × 1]>
#> 2 <split [16/16]> Fold2 <NULL>   <tibble [1 × 1]>

Created on 2021-05-05 by the reprex package (v0.3.0)

simonpcouch commented 10 months ago

The Optimizations vignette now includes some notes on troubleshooting issues with parallel processing. As further changes haven't made it to the top of our to-do list in the last few years, I'm going to go ahead and close. Thank you for the issue. :)

github-actions[bot] commented 9 months ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.