Closed dtquandt closed 3 years ago
Thanks. We'll fix it. Would you mind sharing what application you use flaml for? Feel free to chat on gitter.
As far as I can tell - though admittedly I've only superficially tested this - simply changing line 1021 in automl.py to
assert split_type in [None, "stratified", "uniform", "time", "group"]
and specifying the 'group' split_type when fitting fixes this issue and works as intended. I don't know if this feature was disabled for a specific reason or if it breaks something somewhere else, but it seems to work for me.
Hi! I made a request to include Group K-fold cross validation a few months ago and it worked great then, but now I'm having some trouble with the feature. I'm not sure if it was an API change or if I'm doing something wrong, but I pass in the groups (as strings) using the
groups
argument when fitting an autoML object but it does not appear to be having an effect.The intended behavior is that the CV splitting should be done across the groups defined in train_groups, but that does not appear to be happening. There is no indication that GroupKFold is being used in the log and the model overfits to a degree that would not be possible without data leakage.
I tried setting the split type to 'group' after looking into the lib's code but that is not valid for classification. It seems that GroupKFold is only used when split type is set to 'group', and that is only possible for the ranking objective. Is this intentional?