microsoft / FLAML

A fast library for AutoML and tuning. Join our Discord: https://discord.gg/Cppx2vSPVP.
https://microsoft.github.io/FLAML/
MIT License
3.91k stars 508 forks source link

Option for groupKFold for regression problems #56

Closed suryajayaraman closed 3 years ago

suryajayaraman commented 3 years ago

Hi,

I'm trying to tune lightgbm for a regression problem and need to use groupKFold for cross-validation. By default, automl.fit() takes repeatedkfold as split_type. I looked up at the documentation, but couldn't find details regarding that. Also, how to pass the groups arguments to it.

Thanks in advance.

sonichi commented 3 years ago

Hi @suryajayaraman there are two ways to go about it:

  1. Add support for split_type='group' in AutoML.fit(), in addition to 'stratified' and 'uniform'. Group weights can be passed and handled in a similar way as 'sample_weight'. Feel free to create a PR.
  2. Use flaml.tune.run() to which you can pass a custom function evaluate_config. You can put the groupKFold cross-validation inside that custom function.
dtquandt commented 3 years ago

+1, I'm having the same issue but for classification. Trying to understand the underlying code better so that I can think about whether I feel confident adding it myself.

sonichi commented 3 years ago

@dtquandt @suryajayaraman Let's discuss on gitter about it.

sonichi commented 3 years ago

@suryajayaraman @dtquandt The PR is created. Please let us know your suggestions.