guillermo-navas-palencia / optbinning

Optimal binning: monotonic binning with constraints. Support batch & stream optimal binning. Scorecard modelling and counterfactual explanations.
http://gnpalencia.org/optbinning/
Apache License 2.0
443 stars 99 forks source link

Model variable screening #118

Open cfkstat opened 3 years ago

cfkstat commented 3 years ago

How to develop a scorecard that uses lasso or ridge for variable screening to get a model that is more generalizable than a model with a full subset of variables?

cfkstat commented 3 years ago

My guess is to modify the scorecard class parameter Estimator, such as replacing the LogisticRegression with Glmnet, but I'm not sure this will work.

guillermo-navas-palencia commented 3 years ago

Hi @cfkstat,

Any class supporting .fit(), .predict() and predict_proba() is suitable to be used as an estimator. The Logistic regression in sklearn supports lasso and ridge regularization via parameter penalty. The problem is that after fitting, there is no function to filter those variables with abs(coefficient) < threshold.

If I understood correctly, what you are proposing would require a function to retrieve estimator support, which it is currently missing.