guillermo-navas-palencia / optbinning

Optimal binning: monotonic binning with constraints. Support batch & stream optimal binning. Scorecard modelling and counterfactual explanations.
http://gnpalencia.org/optbinning/
Apache License 2.0
434 stars 98 forks source link

Option to force continuous target type in BinningProcess #275

Closed goto-loop closed 7 months ago

goto-loop commented 8 months ago

BinningProcess.fit() currently determines the type of the target variable (binary/continuous/multiclass) using sklearn.utils.multiclass.type_of_target(), which only considers a variable as continuous if it "is an array-like of floats that are not all integers". However, there are regression problems where all values of the continuous target are integers – for example when predicting high values like house prices, sales revenue, or in my case the RPM of a machine.

If I am not mistaken, the only way to use BinningProcess in such cases is by dividing all values by a constant like 1000, later followed by a multiplication, so that you end up with "real floats". It would be great if BinningProcess offered a parameter (e.g. force_continuous_target), which lets you skip the type_of_target() call.

guillermo-navas-palencia commented 7 months ago

Hi @goto-loop. To he honest, I would consider this option a bit overkill since this is really an edge case and binning is not affected by target scaling.