guillermo-navas-palencia / optbinning

Optimal binning: monotonic binning with constraints. Support batch & stream optimal binning. Scorecard modelling and counterfactual explanations.
http://gnpalencia.org/optbinning/
Apache License 2.0
459 stars 100 forks source link

Getting individuals plots from the scorecard ? #334

Closed lcrmorin closed 1 month ago

lcrmorin commented 1 month ago

After looking trough the documentation It seems that there is no way to plot binning tables from a scorecard.

My 'problem' is that I end up performing the same binning multiple times.

For data exploration I will do something like this:

target = 'loan_status'
features = [c for c in data.columns if c not in target]

for c in features:
    dtype = 'numerical' if pd.api.types.is_numeric_dtype(data[c]) else 'categorical'

    optb = optbinning.OptimalBinning(name=c, dtype=dtype, solver="cp")
    optb.fit(data[c].values, data[target].values.flatten())

    binning_table = optb.binning_table
    binning_table.build()
    binning_table.plot(metric="event_rate")

Then for modelling:

scorecard = Scorecard(binning_process = BinningProcess(features),
                      estimator = LogisticRegression(),
                     )

scorecard.fit(data[features], data[target])

scorecard.table(style="detailed")

The scorecard table provides all the splits. However there is no way to get back to the individual plots easily.

guillermo-navas-palencia commented 1 month ago

Hi @lcrmorin. The scorecard class has the attribute binningprocess, so you can retrieve individual plots from the binning process (see binning process tutorials). Let me know if that works for you.

lcrmorin commented 1 month ago

Indeed, after fitting the scorecard, the following code achieve the same as the data exploration loop presented before.

for c in features:
    scorecard.binning_process_.get_binned_variable(c).binning_table.plot(metric="event_rate")