guillermo-navas-palencia / optbinning

Optimal binning: monotonic binning with constraints. Support batch & stream optimal binning. Scorecard modelling and counterfactual explanations.
http://gnpalencia.org/optbinning/
Apache License 2.0
434 stars 98 forks source link

How to fix the number of bins equal to the number of categories #308

Closed josp70 closed 3 months ago

josp70 commented 3 months ago

I have a dataset with a binary target an a categorical variable taking 3 different values. I want to report the statistics for the binning table with the number of bins equal to 3 (without optimization). For that I'm setting the parameter min_n_bins=3 but the result is a table with a single bin. If I set min_n_bins to the default value None then I get a result with 2 bins.

Is there a way to build a binning with each category as a bin (no category grouping)?

guillermo-navas-palencia commented 3 months ago

Hi, @josp70.

For this case, you can use the parameters user_splits and user_splits_fixed. There is an example in this tutorial: https://gnpalencia.org/optbinning/tutorials/tutorial_binary.html#User-defined-split-points.

If the solution turns out to be infeasible (check information()), disallow all constraints. If you are willing to provide a reproducible example, I can have a look.

josp70 commented 3 months ago

@guillermo-navas-palencia that did it:

user_splits = np.array([[x] for x in data[variable].unique()], dtype=object)
user_splits_fixed = [True for x in data[variable].unique()]

data is my dataframe and variable, the name of the categorical variable to plot

Thanks, great package!!!