Closed afewmoments closed 3 years ago
You can read a bit more about tuning engine-specific parameter in this blog post.
If you feel that it is likely that many people will want to tune lambda
, we could move this over to dials and keep this as an open issue there, considering it for prioritization to add to the engine-specific parameters for xgboost.
In the meantime, what you need to do is create a parameter function and then a parameter set:
library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#> method from
#> required_pkgs.model_spec parsnip
xgb_spec <-
boost_tree("regression",
trees = tune(),
learn_rate = 0.02) %>%
set_engine("xgboost", lambda = tune())
lambda <- function(range = c(-10, 0), trans = log10_trans()) {
new_quant_param(
type = "double",
range = range,
inclusive = c(TRUE, TRUE),
trans = trans,
label = c(lambda = "Amount of Regularization"),
finalize = NULL
)
}
param_set <- parameters(list(lambda(), trees()))
car_folds <- vfold_cv(mtcars, v = 3)
tune_grid(xgb_spec, mpg ~ ., resamples = car_folds, param_info = param_set)
#> # Tuning results
#> # 3-fold cross-validation
#> # A tibble: 3 × 4
#> splits id .metrics .notes
#> <list> <chr> <list> <list>
#> 1 <split [21/11]> Fold1 <tibble [20 × 6]> <tibble [0 × 1]>
#> 2 <split [21/11]> Fold2 <tibble [20 × 6]> <tibble [0 × 1]>
#> 3 <split [22/10]> Fold3 <tibble [20 × 6]> <tibble [0 × 1]>
Created on 2021-08-05 by the reprex package (v2.0.0)
@juliasilge thank you for this solution! It would be kind of you to move my request to dials, because it seems to me, as xgboost grows in popularity, there will be an increase of people willing to tune specific parameters.
Next step is to add the engine-specific lambda
and alpha
here. This might be a good opportunity for a first-time contributor.
Working on a PR for these new xgboost
specific tuning parameters, but creating the alpha()
parameter masks scales::alpha()
, which I'm assuming we'd like to avoid. Is there a SOP for this situation?
Any thoughts on using the existing code for penalty()
and mixture()
and having some conversion that maps to the corresponding xgboost
specific parameters lambda
and alpha
? I'd imagine that'd be pretty complex, and wouldn't match the true xgboost
parms here
This is more of parsnip
complaint but I think the documentation could use some clarification on what is really an engine specify tuning parameter:
For instance the boost_tree()
docs have stop_iter()
as (specific engines only). This is only tunable with engine xgboost
but it can it be tuned within boost_tree(stop_iter = tune())
Shouldn't this parameter get the same treatment as scale_pos_weight()
and have to be tuned within set_engine('xgboost', stop_iter = tune())
since it is only tunable with the xgboost
engine?
Interested in your thoughts!
As far as what is a main argument vs. an engine-specific argument, here is how we lay it out in TMwR:
Modeling functions in parsnip separate model arguments into two categories:
- Main arguments are more commonly used and tend to be available across engines.
- Engine arguments are either specific to a particular engine or used more rarely.
It is absolutely somewhat subjective and a judgment call.
I think the idea with stop_iter
is that, although I am only aware of xgboost having it, it is used a lot by xgboost users so it is more helpful for us to fully support it as a main argument. I don't think the same argument can be made for scale_pos_weight
. These are definitely subjective calls, though.
Thank you for being willing to contribute a PR to add these engine-specific parameters! 🙌 @hfrick do you have an opinion on how to name alpha
and lambda
to reduce confusion/masking? I know I use alpha
a lot as an argument name in ggplot2 but I don't know that I use the function that much.
I think penalty_L1()
and penalty_L2()
would be good! This would be nice to put in the next release for dials which we are aiming for for next week. @joeycouse would that timeline work for you or would you prefer us adding it in ourselves?
I'll try my best! I just opened a PR but I think some edits will need to be done on the parsnip side.
Thank you @joeycouse !
Closed in #179
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.
I want to be able to
tune()
all available hyperparameters inxgboost
(in particularlambda
andalpha
). It seems that I can onlytune()
what is listed in the help https://parsnip.tidymodels.org/reference/details_boost_tree_xgboost.html:tree_depth
trees
learn_rate
mtry
min_n
loss_reduction
sample_size
stop_iter
Currently, to specify
lambda
I need to specify explicitly:How can I
tune()
lambda
in that case? In general, why can't Itune()
all hyperparameters available on https://xgboost.readthedocs.io/en/latest/parameter.html