guillermo-navas-palencia / optbinning

Optimal binning: monotonic binning with constraints. Support batch & stream optimal binning. Scorecard modelling and counterfactual explanations.
http://gnpalencia.org/optbinning/
Apache License 2.0
435 stars 98 forks source link

[BUG] argument metric_missing=0 is ignored when points for missing cat is calculated in scorecard table #226

Closed peterpanmj closed 1 year ago

peterpanmj commented 1 year ago

description

Scorecard._fit will ignore the metric_missing=0 parameter, when build scorecard. I

example

data = pd.DataFrame(
    data = {'target': np.hstack(
        (np.random.choice([0, 1], 100, p=[0.1, 0.9]),
         np.random.choice([0, 1], 100, p=[0.9, 0.1])
        )),
    'var':[np.nan]*100+['A']*100
    }
)

scorecard3 = Scorecard(binning_process=binning_process, 
                       estimator=LogisticRegression(),
                       scaling_method="min_max",
                       scaling_method_params=scaling_method_params
                      ).fit(data, data.target,metric_missing=0, metric_special=0)

print(scorecard3.table(style='detailed'))

current behaviour

scorecard3.table(style='detailed') is some positive number, however the actual result should be zero, since there are only two bins . So one bin gets 100 and the other get 0.

expected results

by simply configure metric_special='empirical', will give the correct results, even though there is no special cases in the data or in the binning_process

scorecard1 = Scorecard(binning_process=binning_process,
                       estimator=LogisticRegression(),
                       scaling_method="min_max",
                       scaling_method_params=scaling_method_params
                      ).fit(data, data.target,metric_missing=0, metric_special='empirical')

print(scorecard1.table(style='detailed'))

I have a fix for that. It is actually quite obvious. The source code just ignored the argument metric_missing when metric_special !='empirical' However I found no docs about where and how to put new tests in this project. Can anyone give me some info ?

guillermo-navas-palencia commented 1 year ago

Thank you @peterpanmj. I commented on the pull request.