uclamii / model_tuner

A library to tune the hyperparameters of common ML models. Supports calibration and custom pipelines.
Apache License 2.0
3 stars 0 forks source link

Enhancement Request for stratify_y=True in Regression Tasks #38

Closed lshpaner closed 1 month ago

lshpaner commented 1 month ago

Issue: Enhancement Request for stratify_y=True in Regression Tasks

Description: The current functionality of stratify_y=True works perfectly for binary classification, where the target values are not continuous and typically binary (0 or 1). However, when used in regression, where the target values are continuous, the same input parameter leads to an error. Specifically, it throws the following exception:

ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.

This occurs because the stratify parameter is designed for categorical target values, not continuous ones.

Proposed Enhancement: For regression tasks where continuous target values are provided, it would be helpful to either:

  1. Throw a clearer exception: Provide a more informative error message guiding users on the next steps. For example:
ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2. Bin your continuous y values into categories using pandas.qcut (for quantiles) or pandas.cut (for custom bins) to ensure the same distribution of bins among split sets of data.
elemets commented 1 month ago

This is a good suggestion but currently other issues are taking priority this may be reopened in the future.