guillermo-navas-palencia / optbinning

Optimal binning: monotonic binning with constraints. Support batch & stream optimal binning. Scorecard modelling and counterfactual explanations.
http://gnpalencia.org/optbinning/
Apache License 2.0
452 stars 100 forks source link

Sample weight problem #254

Closed matwroblewski closed 7 months ago

matwroblewski commented 1 year ago

Hi! It appears that there is an inconsistency in the add_constraint_min_max_bin_size function when sample weights are used. We compare there bin_size, which uses n_records (this is the sum of weights for a particular bin) with min_bin_size, which is calculated here https://github.com/guillermo-navas-palencia/optbinning/blob/master/optbinning/binning/continuous_binning.py#L759 and this value doesn't take into account sample weights.

As a result, when the weights are relatively small, it is possible to encounter situations where no split is generated, despite obtaining splits during the prebinning process.

The problem occurs when we specify the min_bin_size parameter. I suspect it may be the same for max_bin_size.

guillermo-navas-palencia commented 9 months ago

Hi @MateuszWroblewski3010.

A bit late, but could you please provide a reproducible example? Thanks!