Is there any interest to implement some supervised discretization algorithms? I would imagine that it might work better than quantile on kmeans discretization in KBinsDiscretizer, particularly on somewhat imbalanced classes.
There some of the supervised algorithms in the top left corner, have a comparable classification to "equal width" or "equal frequency" approaches but with 2-4x fewer bins.
I haven't done a proper review, but among well established methods there are for instance,
CAIM discretization algorithm, Kurgan 2004 (511 citations), Python implementation available here but it doesn't seem much used.
Is there any interest to implement some supervised discretization algorithms? I would imagine that it might work better than quantile on kmeans discretization in
KBinsDiscretizer
, particularly on somewhat imbalanced classes.There is a detailed review of supervised discretization techniques in "A survey of discretization techniques: Taxonomy and empirical analysis in supervised learning", Garcia et al 2012. I have only skimmed though it, it is quite extensive. Here is a figure comparing the classification accuracy using naive bayes classifier, averaged over 40 datasets , as far as I understood,
There some of the supervised algorithms in the top left corner, have a comparable classification to "equal width" or "equal frequency" approaches but with 2-4x fewer bins.
I haven't done a proper review, but among well established methods there are for instance,
Thoughts? It could also be something for scikit-learn-extra..