Closed SewerynGrodny closed 4 years ago
mtry
depends on the number of columns so the upper part of the range cannot be set. The finalize()
method can do this if you pass in the predictors:
library(tidymodels)
#> ── Attaching packages ────────────────────────────────────────────────────────────────────────────────── tidymodels 0.0.4 ──
#> ✓ broom 0.5.4 ✓ recipes 0.1.9
#> ✓ dials 0.0.4 ✓ rsample 0.0.5
#> ✓ dplyr 0.8.4 ✓ tibble 2.1.3
#> ✓ ggplot2 3.2.1 ✓ tune 0.0.1
#> ✓ infer 0.5.1 ✓ workflows 0.1.0
#> ✓ parsnip 0.0.5 ✓ yardstick 0.0.5
#> ✓ purrr 0.3.3
#> ── Conflicts ───────────────────────────────────────────────────────────────────────────────────── tidymodels_conflicts() ──
#> x purrr::discard() masks scales::discard()
#> x dplyr::filter() masks stats::filter()
#> x dplyr::lag() masks stats::lag()
#> x ggplot2::margin() masks dials::margin()
#> x recipes::step() masks stats::step()
#> x recipes::yj_trans() masks scales::yj_trans()
rf_params_cars = parameters(mtry(), min_n())
rf_params_cars
#> Collection of 2 parameters for tuning
#>
#> id parameter type object class
#> mtry mtry nparam[?]
#> min_n min_n nparam[+]
#>
#> Parameters needing finalization:
#> # Randomly Selected Predictors ('mtry')
#>
#> See `?dials::finalize` or `?dials::update.parameters` for more information.
rf_params_cars <-
rf_params_cars %>%
update(mtry = finalize(mtry(), mtcars %>% select(-mpg)))
rf_params_cars
#> Collection of 2 parameters for tuning
#>
#> id parameter type object class
#> mtry mtry nparam[+]
#> min_n min_n nparam[+]
set.seed(131)
rf_grid_cars = grid_max_entropy(rf_params_cars, size = 3)
rf_grid_cars
#> # A tibble: 3 x 2
#> mtry min_n
#> <int> <int>
#> 1 4 34
#> 2 9 21
#> 3 2 16
Created on 2020-02-24 by the reprex package (v0.3.0)
We need a better error message though.
(edit - hit wrong key)
I'm going to move this to dials
and update the title.
There is also minor problem with show_best function which throw an error if there are NA in .metric.
That's because the message (and entries in the .notes
column) tell you that
> rf_stage_1_cv_results_tbl_oto$.notes[[5]]$.notes
[1] "internal: A correlation computation is required, but `estimate` is constant
and has 0 standard deviation, resulting in a divide by 0 error. `NA` will be
returned."
This happens when a model predicts the same value for all samples.
The main error in the code was the lack of metric
argument:
> rf_stage_1_cv_results_tbl_oto %>% show_best()
Error in check_metric_choice(metric, maximize) :
argument "metric" is missing, with no default
> rf_stage_1_cv_results_tbl_oto %>% show_best(metric = "rmse", maximize = FALSE)
# A tibble: 5 x 6
min_n .metric .estimator mean n std_err
<int> <chr> <chr> <dbl> <int> <dbl>
1 2 rmse standard 2.06 5 0.292
2 5 rmse standard 2.24 5 0.286
3 7 rmse standard 2.49 5 0.271
4 9 rmse standard 2.65 5 0.247
5 11 rmse standard 2.94 5 0.204
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.
Hi, thanks for great tidymodels packages. (Great job!) While training random forest models, I've encounter an issue with tune grid and parameters. It seems that mtry() is not supported (case 2 in below code). There is also minor problem with show_best function which throw an error if there are NA in .metric.
Best Sewe
Reproducible example