BinningProcess.fit() currently determines the type of the target variable (binary/continuous/multiclass) using sklearn.utils.multiclass.type_of_target(), which only considers a variable as continuous if it "is an array-like of floats that are not all integers". However, there are regression problems where all values of the continuous target are integers – for example when predicting high values like house prices, sales revenue, or in my case the RPM of a machine.
If I am not mistaken, the only way to use BinningProcess in such cases is by dividing all values by a constant like 1000, later followed by a multiplication, so that you end up with "real floats". It would be great if BinningProcess offered a parameter (e.g. force_continuous_target), which lets you skip the type_of_target() call.
Hi @goto-loop. To he honest, I would consider this option a bit overkill since this is really an edge case and binning is not affected by target scaling.
BinningProcess.fit()
currently determines the type of the target variable (binary/continuous/multiclass) usingsklearn.utils.multiclass.type_of_target()
, which only considers a variable as continuous if it "is an array-like of floats that are not all integers". However, there are regression problems where all values of the continuous target are integers – for example when predicting high values like house prices, sales revenue, or in my case the RPM of a machine.If I am not mistaken, the only way to use
BinningProcess
in such cases is by dividing all values by a constant like 1000, later followed by a multiplication, so that you end up with "real floats". It would be great ifBinningProcess
offered a parameter (e.g.force_continuous_target
), which lets you skip thetype_of_target()
call.