guillermo-navas-palencia / optbinning

Optimal binning: monotonic binning with constraints. Support batch & stream optimal binning. Scorecard modelling and counterfactual explanations.
http://gnpalencia.org/optbinning/
Apache License 2.0
434 stars 98 forks source link

Result of default OptimalBinning is worse compare with the one that has more restrictions #311

Closed nic9lif3 closed 3 months ago

nic9lif3 commented 3 months ago

Hi @guillermo-navas-palencia,

Today I found a weird result with OptimalBinning,

If I use this code, the result will give the IV total is 0.001707

bin=optbinning.OptimalBinning(
    min_prebin_size=0.01,
    max_n_prebins=40,
#     monotonic_trend='ascending'
)
bin.fit(data_train['avginstallast24m_3658937A'],data_train['target'])
bin.binning_table.build()

image But when I add the monotonic_trend constraint, it will give a better IV 0.005960

bin=optbinning.OptimalBinning(
    min_prebin_size=0.01,
    max_n_prebins=40,
    monotonic_trend='descending'
)
bin.fit(data_train['avginstallast24m_3658937A'],data_train['target'])
bin.binning_table.build()

image This data can be found here datatest.csv Could you explain why it happened? Perhaps the fewer contains, the higher the IV? Thanks.

guillermo-navas-palencia commented 3 months ago

Hi @nic9lif3,

The monotonic_trend="auto" is not infallible. In this particular case, the predicted monotonic trend was not optimal. In general, when event rates are close to flat, the algorithm might fail. However, this situation is much more unlikely for features with medium/high IV.