Closed joeycouse closed 2 years ago
Thank you @joeycouse ! I've put in a PR for parsnip so that the tidymodels machinery can pick up the new dials parameter.
@simonpcouch do you have opinions on this? In particular: any comments on the default range chosen here? 🙌
This PR looks great, thank you @joeycouse!
100 seems reasonable for the upper bound. 31 feels possibly high—31 is lightgbm’s default. I would think something like 5, maybe 10?
@jameslamb, do you have any thoughts here? For context, this PR adds objects that support tidymodels tuning the num_leaves
parameter. If a user notes they'd like to tune num_leaves
but don't supply the grid of values they'd like to evaluate over, tidymodels will select a set of values in some default range. That range can be adjusted, as well as the number of draws from that range and the sampling design used to take those draws.
A stray note, for bonsai: lightgbm allows passing this argument with a few different aliases. We ought to keep an eye out for users attempting to tune this parameter with one of these aliases, probably nudging them to just use the argument with this name so they get the tuning machinery "for free."
Related to https://github.com/tidymodels/bonsai/issues/49.
thanks for the @
and context @simonpcouch !
I agree that 31
is probably too high of a floor. Knowing nothing about the size and shape of data, I think the range num_leaves in [5, 100]
is a pretty good starting point!
Please also keep in mind this related discussion from @dfsnow: https://github.com/tidymodels/bonsai/issues/49. Some combinations of max_depth
and num_leaves
are impossible.
with a few different aliases
LightGBM supports so many different aliases for parameters as a way to offer compatibility with a wide range of frameworks in different languages. I understand that that can make it challenging to configure, and has been a source of a lot of bugs and maintenance effort over the years 😭
We have some internal mechanisms in the R and Python packages (and another in the core C/C++ code) for resolving aliases. If you'd like, in a separate issue thread, I'd be happy to provide some links to those and to have a discussion about the possibility of making some of those mechanisms part of the public API that you could just import.
Thanks a lot for chiming in, @jameslamb!
Let's go with [5, 100], then.🏄♀️
Following up here, I'll revisit bonsai's 49 and open up an issue in bonsai related to the aliasing interface. Thanks for your willingness to discuss / make changes here!🙂
Excellent, thanks for the input @jameslamb and thanks for the PR @joeycouse !
@simonpcouch I'll merge this PR and leave the parsnip one open until we've discussed release schedule etc 👍
This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.
This PR adds the num_leaves dials object. This is an engine specific tuning parameter for 'lightgbm', it's probably the main parameter for controlling the complexity of lightbgm models since they grow leaf-wise instead of depth-wise lightgbm doc.