scikit-learn / scikit-learn

scikit-learn: machine learning in Python
https://scikit-learn.org
BSD 3-Clause "New" or "Revised" License
59.43k stars 25.26k forks source link

Supervised discretization #15551

Open rth opened 4 years ago

rth commented 4 years ago

Is there any interest to implement some supervised discretization algorithms? I would imagine that it might work better than quantile on kmeans discretization in KBinsDiscretizer, particularly on somewhat imbalanced classes.

There is a detailed review of supervised discretization techniques in "A survey of discretization techniques: Taxonomy and empirical analysis in supervised learning", Garcia et al 2012. I have only skimmed though it, it is quite extensive. Here is a figure comparing the classification accuracy using naive bayes classifier, averaged over 40 datasets , as far as I understood,

1469_2013-Garcia-IEEETKDE

There some of the supervised algorithms in the top left corner, have a comparable classification to "equal width" or "equal frequency" approaches but with 2-4x fewer bins.

I haven't done a proper review, but among well established methods there are for instance,

Thoughts? It could also be something for scikit-learn-extra..

alfaro96 commented 4 years ago

If there is any interest on working in this PR, I would be happy to implement the MDLP algorithm.