tidymodels / dials

Tools for creating tuning parameter values
https://dials.tidymodels.org/
Other
113 stars 27 forks source link

add num_leaves dials object #256

Closed joeycouse closed 2 years ago

joeycouse commented 2 years ago

This PR adds the num_leaves dials object. This is an engine specific tuning parameter for 'lightgbm', it's probably the main parameter for controlling the complexity of lightbgm models since they grow leaf-wise instead of depth-wise lightgbm doc.

hfrick commented 2 years ago

Thank you @joeycouse ! I've put in a PR for parsnip so that the tidymodels machinery can pick up the new dials parameter.

@simonpcouch do you have opinions on this? In particular: any comments on the default range chosen here? 🙌

simonpcouch commented 2 years ago

This PR looks great, thank you @joeycouse!

100 seems reasonable for the upper bound. 31 feels possibly high—31 is lightgbm’s default. I would think something like 5, maybe 10?

@jameslamb, do you have any thoughts here? For context, this PR adds objects that support tidymodels tuning the num_leaves parameter. If a user notes they'd like to tune num_leaves but don't supply the grid of values they'd like to evaluate over, tidymodels will select a set of values in some default range. That range can be adjusted, as well as the number of draws from that range and the sampling design used to take those draws.

A stray note, for bonsai: lightgbm allows passing this argument with a few different aliases. We ought to keep an eye out for users attempting to tune this parameter with one of these aliases, probably nudging them to just use the argument with this name so they get the tuning machinery "for free."

Related to https://github.com/tidymodels/bonsai/issues/49.

jameslamb commented 2 years ago

thanks for the @ and context @simonpcouch !

I agree that 31 is probably too high of a floor. Knowing nothing about the size and shape of data, I think the range num_leaves in [5, 100] is a pretty good starting point!

Please also keep in mind this related discussion from @dfsnow: https://github.com/tidymodels/bonsai/issues/49. Some combinations of max_depth and num_leaves are impossible.

with a few different aliases

LightGBM supports so many different aliases for parameters as a way to offer compatibility with a wide range of frameworks in different languages. I understand that that can make it challenging to configure, and has been a source of a lot of bugs and maintenance effort over the years 😭

We have some internal mechanisms in the R and Python packages (and another in the core C/C++ code) for resolving aliases. If you'd like, in a separate issue thread, I'd be happy to provide some links to those and to have a discussion about the possibility of making some of those mechanisms part of the public API that you could just import.

simonpcouch commented 2 years ago

Thanks a lot for chiming in, @jameslamb!

Let's go with [5, 100], then.🏄‍♀️

Following up here, I'll revisit bonsai's 49 and open up an issue in bonsai related to the aliasing interface. Thanks for your willingness to discuss / make changes here!🙂

hfrick commented 2 years ago

Excellent, thanks for the input @jameslamb and thanks for the PR @joeycouse !

@simonpcouch I'll merge this PR and leave the parsnip one open until we've discussed release schedule etc 👍

github-actions[bot] commented 1 year ago

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.