Closed cimentadaj closed 4 years ago
This is happeing, at least for dials:::grid_latin_hypercube
, in https://github.com/tidymodels/dials/blob/611260de1729e41843fc3ea9b8baaa916bcd3270/R/space_filling.R#L202
However, I don't understand why, since set.seed
should always return the same value in sample.int
.
set.seed(32151); sample.int(10^5, 1)
[1] 83640
set.seed(32151); sample.int(10^5, 1)
[1] 83640
I'm calling this a bug because it looks like the seed isn't getting propagated correctly somewhere, in one of these two cases.
Ok, so I've been tinkering with this to make it work. Focusing only the linear_reg
example, the only parameter to not match the two grid_regular
calls is mixture
. For example:
library(parsnip)
library(dials)
#> Loading required package: scales
library(tune)
reg_mod <-
linear_reg(penalty = tune(), mixture = tune()) %>%
set_engine("glmnet")
p1 <- parameters(reg_mod)
p2 <- parameters(penalty(), mixture())
set.seed(42131)
grid_regular(p1, levels = 3)
#> # A tibble: 9 x 2
#> penalty mixture
#> <dbl> <dbl>
#> 1 0.0000000001 0.05
#> 2 0.00001 0.05
#> 3 1 0.05
#> 4 0.0000000001 0.525
#> 5 0.00001 0.525
#> 6 1 0.525
#> 7 0.0000000001 1
#> 8 0.00001 1
#> 9 1 1
set.seed(42131)
grid_regular(p2, levels = 3)
#> # A tibble: 9 x 2
#> penalty mixture
#> <dbl> <dbl>
#> 1 0.0000000001 0
#> 2 0.00001 0
#> 3 1 0
#> 4 0.0000000001 0.5
#> 5 0.00001 0.5
#> 6 1 0.5
#> 7 0.0000000001 1
#> 8 0.00001 1
#> 9 1 1
Here, penalty
is the same between both calls yet mixture
is not. After diving in a bit, this is because the range in mixture
is forced to start at 0.05
when using glmnet
. Inside parameters
, tunable
is called on the first line:
tune:::parameters.model_spec
#> function (x, ...)
#> {
#> all_args <- tunable(x)
#> tuning_param <- tune_args(x)
#> res <- dplyr::inner_join(tuning_param %>% dplyr::select(-tunable,
#> -component_id), all_args, by = c("name", "source", "component")) %>%
#> mutate(object = purrr::map(call_info, eval_call_info))
#> dials::parameters_constr(res$name, res$id, res$source, res$component,
#> res$component_id, res$object)
#> }
#> <bytecode: 0x55724d5d3898>
#> <environment: namespace:tune>
And the method tunable.linear_reg
explicitly changes this range:
tune:::tunable.linear_reg
#> function (x, ...)
#> {
#> res <- NextMethod()
#> if (x$engine == "glmnet") {
#> res$call_info[res$name == "mixture"] <- list(list(pkg = "dials",
#> fun = "mixture", range = c(0.05, 1)))
#> }
#> res
#> }
#> <bytecode: 0x55724d5d6c20>
#> <environment: namespace:tune>
So the problem, at least for linear_reg
, is not related to set.seed
. Is there a particular reason why this is done? 0
is a completely valid mixture to search for in a tuning grid.
The problems are completely unrelated to the grid_*
functions. They all come from parameters
. For the first example using svm_rbf
, the inconsistency is because the range of values in cost
are different between the ones saved internally in parsnip
for svm_rbf
and the cost
function. For example:
library(parsnip)
library(dials)
#> Loading required package: scales
library(tune)
svm_mod <-
svm_rbf(cost = tune(), rbf_sigma = tune()) %>%
set_mode("classification") %>%
set_engine("kernlab")
res <- tune::tunable(svm_mod)
res[res$name == "cost", "call_info", drop = TRUE]
#> [[1]]
#> [[1]]$pkg
#> [1] "dials"
#>
#> [[1]]$fun
#> [1] "cost"
#>
#> [[1]]$range
#> [1] -10 5
cost()
#> Cost (quantitative)
#> Transformer: log-2
#> Range (transformed scale): [-10, -1]
In parsnip
, the default range is -10
to 5
, while in cost
it's -10
and -1
. Based on this, we can fix the first example to work as expected:
set.seed(42131)
grid_latin_hypercube(parameters(svm_mod), size = 10)
#> # A tibble: 10 x 2
#> cost rbf_sigma
#> <dbl> <dbl>
#> 1 0.00319 3.30e-10
#> 2 2.01 7.35e- 2
#> 3 0.185 1.86e- 8
#> 4 0.00817 1.82e- 7
#> 5 0.0673 4.54e- 3
#> 6 0.0601 8.10e- 6
#> 7 0.00169 6.81e- 1
#> 8 8.03 1.11e- 9
#> 9 0.952 9.93e- 4
#> 10 20.9 8.88e- 5
set.seed(42131)
grid_latin_hypercube(cost(range = c(-10, 5)), rbf_sigma(), size = 10)
#> # A tibble: 10 x 2
#> cost rbf_sigma
#> <dbl> <dbl>
#> 1 0.00319 3.30e-10
#> 2 2.01 7.35e- 2
#> 3 0.185 1.86e- 8
#> 4 0.00817 1.82e- 7
#> 5 0.0673 4.54e- 3
#> 6 0.0601 8.10e- 6
#> 7 0.00169 6.81e- 1
#> 8 8.03 1.11e- 9
#> 9 0.952 9.93e- 4
#> 10 20.9 8.88e- 5
Not sure whether this is intended or it's a bug. In any case, I believe fixing it would be a breaking change in either parsnip
or cost
.
This is better documented now in the pages for the grid functions.
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.
I'm trying to create tuning grids through two different approaches. The first approach is to define a model, populate the parameters with
tune()
, extract the tuning params withparameters
and then pass it to a grid function. Alternatively, I also define tuning grids manually by using the parameter specific function (for example,cost()
) in the grid function. To my surprise, both approaches yield different grids even when using the same seed before calling each. Is this expected?Below are two examples using the same model but manually specifying
cost()
and another usingparameters
directly: