tidymodels / dials

Tools for creating tuning parameter values
https://dials.tidymodels.org/
Other
113 stars 27 forks source link

Simple parameter vector c(1,5,3) #148

Closed Steviey closed 4 years ago

Steviey commented 4 years ago

R 4.2, tidymodels latest

Hello,

I'm just wondering, if there is an option in dials to define a simple parameter vector, let's say: "c(1,5,3)" and not to use a range, sequence etc.?

I tried to use value_set()...


pslTidyModelOptions[['modelParams']] <- pslTidyModelOptions[['modelParams']] %>% 
    stats::update(
        learn_rate=dials::learn_rate(range=c(0.1,0.1),trans=NULL)
        ,sample_size=dials::sample_prop(range=c(0.5,0.9), trans = NULL)
        ,tree_depth = dials::value_set(tree_depth(),values=c(4,6,8,12))

    )

...which seems to work partly....

 mtry tree_depth learn_rate sample_size .iter .metric .estimator  mean     n std_err
  <int>      <int>      <dbl>       <dbl> <dbl> <chr>   <chr>      <dbl> <int>   <dbl>
1     1          4        0.1         0.9     0 mae     standard    6.17     2  0.0272
2    12          6        0.1         0.9     0 mae     standard    6.19     2  0.0231
3     1          4        0.1         0.5     0 mae     standard    6.21     2  0.0181
4    12          6        0.1         0.5     0 mae     standard    6.21     2  0.0191
5     1          6        0.1         0.9     0 mae     standard    6.22     2  0.0409
# … with 1 more variable: .config <chr>

... but then I also get this error...

── Iteration 1 ──────────────────────────────────────────────────────────────────────────

i Current best:     rmse=7.667 (@iter 0)
i Gaussian process model
! Gaussian process model: no non-missing arguments to min; returning Inf, no non-missi...
x Gaussian process model: Error in GPfit::GP_fit(X = x, Y = dat$mean, ...): The dimens...
! An error occurred when creating candidates parameters:  Error in GPfit::GP_fit(X = x, Y = dat$mean, ...) : 
  The dimensions of X and Y do not match. 

x Skipping to next iteration
Error in eval(expr, p) : no loop for break/next, jumping to top level
     █
  1. ├─global::bAlgos(i, f, "testing") ~/R/daScript.R:40340:12
  2. │ └─global::standAloneAlgos(...) ~/R/daScript.R:39955:12
  3. │   └─psl$genTidyTune(pslTidyModelOptions) ~/R/daScript.R:17726:12
  4. │     ├─tune::tune_bayes(...) R/PslTools.R:27222:16
  5. │     └─tune:::tune_bayes.workflow(...)
  6. │       └─tune:::tune_bayes_workflow(...)
  7. │         └─tune:::check_and_log_flow(control, candidates)
  8. │           └─base::eval.parent(parse(text = "next"))
  9. │             └─base::eval(expr, p)
 10. │               └─base::eval(expr, p)
 11. └─(function () ...
 12.   └─lobstr::cst() ~/R/daScript.R:9:25

PS: I searched the first quartile of the internet for the answer. Now I'm here :-) .

How to use simple parameter vectors like c(1,5,3) in dials? Is it possible at all?

Thx a lot.

topepo commented 4 years ago

It looks like grid_regular() respects the specific values and the space-filling designs do not. That makes sense but we should document that.

library(tidymodels)
#> ── Attaching packages ──────────────────────────── tidymodels 0.1.1 ──
#> ✓ broom     0.7.0          ✓ recipes   0.1.13    
#> ✓ dials     0.0.9.9000     ✓ rsample   0.0.8     
#> ✓ dplyr     1.0.2          ✓ tibble    3.0.3     
#> ✓ ggplot2   3.3.2          ✓ tidyr     1.1.2     
#> ✓ infer     0.5.2          ✓ tune      0.1.1.9000
#> ✓ modeldata 0.0.2          ✓ workflows 0.2.1.9000
#> ✓ parsnip   0.1.3.9000     ✓ yardstick 0.0.7     
#> ✓ purrr     0.3.4
#> ── Conflicts ─────────────────────────────── tidymodels_conflicts() ──
#> x purrr::discard() masks scales::discard()
#> x dplyr::filter()  masks stats::filter()
#> x dplyr::lag()     masks stats::lag()
#> x recipes::step()  masks stats::step()

mod <- rand_forest(mtry = tune(), min_n = tune()) %>% 
   set_mode("regression") %>% 
   set_engine("ranger")

param <- 
   mod %>% 
   parameters() %>% 
   update(mtry = mtry(c(1, 5)) %>% value_set(1:3))

grid_regular(param, levels = 5)
#> # A tibble: 15 x 2
#>     mtry min_n
#>    <int> <int>
#>  1     1     2
#>  2     2     2
#>  3     3     2
#>  4     1    11
#>  5     2    11
#>  6     3    11
#>  7     1    21
#>  8     2    21
#>  9     3    21
#> 10     1    30
#> 11     2    30
#> 12     3    30
#> 13     1    40
#> 14     2    40
#> 15     3    40

set.seed(1)
grid_max_entropy(param, size = 10)
#> # A tibble: 10 x 2
#>     mtry min_n
#>    <int> <int>
#>  1     1    24
#>  2     3    17
#>  3     2    38
#>  4     4     2
#>  5     3     5
#>  6     1     5
#>  7     4    38
#>  8     3    29
#>  9     5    15
#> 10     5    27

Created on 2020-10-15 by the reprex package (v0.3.0)

You can create the grid you want using data.frame() or tibble().

Otherwise, without a small, reproducible example I can't really tell what might be the issue with your specific code.

Steviey commented 4 years ago

>Otherwise, without a small, reproducible example I can't really tell what might be >the issue with your specific code.

Not so easy having 70K+ lines of code to maintain (highly abstracting framework). I try my best to provide enough code.

If this is the right notation (I do not want a range)...

tree_depth = dials::value_set(tree_depth(),values=c(4,6,8,12))

...it might be just not enough levels defined.

https://www.rdocumentation.org/packages/dials/versions/0.0.8/topics/value_validate

Steviey commented 4 years ago

Playing with that example, suggests that this is indeed the correct notation for a simple numeric parameter vector in dials.

mod <- rand_forest(mtry = tune(), min_n = tune()) %>% 
   set_mode("regression") %>% 
   set_engine("ranger")

param <- 
   mod %>% 
   parameters() %>% 
   update(mtry = mtry(c(1, 5)) %>% value_set(values=c(1,3,5,4,2)))

info<-grid_regular(param, levels = 5)
print(info)

paramInfo<-mtry() %>% range_get()
print(paramInfo)

... However- the mentioned error persists when using "value_set(values=c(1,3,5,4,2))". Levels are not the reason.

── Iteration 1 ──────────────────────────────────────────────────────────────────────────

i Current best:     rmse=7.667 (@iter 0)
i Gaussian process model
! Gaussian process model: no non-missing arguments to min; returning Inf, no non-missi...
x Gaussian process model: Error in GPfit::GP_fit(X = x, Y = dat$mean, ...): The dimens...
! An error occurred when creating candidates parameters:  Error in GPfit::GP_fit(X = x, Y = dat$mean, ...) : 
  The dimensions of X and Y do not match. 

x Skipping to next iteration
Error in eval(expr, p) : no loop for break/next, jumping to top level
     █
  1. ├─global::bAlgos(i, f, "testing") ~/R/daScript.R:40340:12
  2. │ └─global::standAloneAlgos(...) ~/R/daScript.R:39955:12
  3. │   └─psl$genTidyTune(pslTidyModelOptions) ~/R/daScript.R:17726:12
  4. │     ├─tune::tune_bayes(...) R/PslTools.R:27222:16
  5. │     └─tune:::tune_bayes.workflow(...)
  6. │       └─tune:::tune_bayes_workflow(...)
  7. │         └─tune:::check_and_log_flow(control, candidates)
  8. │           └─base::eval.parent(parse(text = "next"))
  9. │             └─base::eval(expr, p)
 10. │               └─base::eval(expr, p)
 11. └─(function () ...
 12.   └─lobstr::cst() ~/R/daScript.R:9:25

Meanwhile I switched to tidyr::crossing() to provide a grid without dials, but with a simple numeric vector to tune_bayes(). But then other problems occur (unknowns). Drives me nuts this shit

juliasilge commented 4 years ago

Sounds like crossing() is a good way to go for you to create a grid of parameters, given your constraints.

Frustrating to still be running into problems with your iterative tuning! If you can create a small reproducible example and post on the tune repo, we can give a go at helping find out what the problem is. I imagine you've already seen this and this for how to get started and best practices.

Steviey commented 4 years ago

@juliasilge Thank you Julia, with my trial and error and research approach, it takes hours to answer the simplest questions. I'm not sure why. It seems something is heavily over-engineered here, or I'm too stupid. At the end I always get it. But there is no real plug & play feeling. And honestly, I can't see the big advantage of such a massive diversity over caret, except the recipe approach. Anyways...

...I had to do the following...

,mtry=c(1,2)# muss so
,sample_size=c(1,2)# muss so

pslTidyModelOptions[['modelParams']] <-pslTidyModelOptions[['modelWorkFlow']] %>%
    parameters() %>%
    update(
        mtry = mtry(range = c(1L, ncol(pslTidyModelOptions[['train']])))
        ,sample_size = sample_size(range = c(1L, nrow(pslTidyModelOptions[['train']])))
    )
github-actions[bot] commented 3 years ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.