guillermo-navas-palencia / optbinning

Optimal binning: monotonic binning with constraints. Support batch & stream optimal binning. Scorecard modelling and counterfactual explanations.
http://gnpalencia.org/optbinning/
Apache License 2.0
452 stars 100 forks source link

type_of_target are not proper to decide multiclass vs continuous #296

Open chunqishi opened 9 months ago

chunqishi commented 9 months ago

for a continuous case, sometime target are just float values without the part after the decimal point, however, type_of_target treat [1.0, 2.0, 3.0, 4.0, 5.0] as multiclass.

could you provide a setting interface to assign self._target_dtype ? or fix type_of_target into continuous when target are float type and the number of unique values larger than 10.

def _fit(self, X, y, sample_weight, check_input): time_init = time.perf_counter()

    if self.verbose:
        logger.info("Binning process started.")
        logger.info("Options: check parameters.")

    _check_parameters(**self.get_params())

    # check X dtype
    if not isinstance(X, (pd.DataFrame, np.ndarray)):
        raise TypeError("X must be a pandas.DataFrame or numpy.ndarray.")

    # check target dtype
    self._target_dtype = type_of_target(y)
guillermo-navas-palencia commented 9 months ago

Hi @chunqishi.

Your are not the first one encountering this issue with https://scikit-learn.org/stable/modules/generated/sklearn.utils.multiclass.type_of_target.html. I will think about it.

guillermo-navas-palencia commented 8 months ago

I think it makes sense to implement this parameter in the Binning Process.