h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.92k stars 2k forks source link

Add model-agnostic permutation feature importance function #9126

Closed exalate-issue-sync[bot] closed 1 year ago

exalate-issue-sync[bot] commented 1 year ago

Permutation feature importance is a great way to get feature importance in a model-agnostic fashion. All our algorithms (except Stacked Ensemble at the moment) have built-in feature importance, but it would be great to have this feature. It makes sense to have it as separate function which does not happen automatically as part of the model building process. This can also be used as a new method for doing metalearning (model selection) inside a Stacked Ensemble.

Here is the methodology:

A) you have a hold-out dataset (or you use the kfold) B) You make predictions using the ensemble model and you measure AUC or whichever other metric ( you have already computed these things with the leaderboard).Lets say this gives 0.8 AUC C) For each column in the data.

you randomly shuffle it

you repeat the scoring where you have that column as random (and everything else is correct)

you measure AUC . Now lets say AUC is 0.7. The different between the original AUC and this one (where one feature is wrong) is the importance of that column

you bring this column back to normal and your repeat for the next column

References: https://christophm.github.io/interpretable-ml-book/feature-importance.html The permutation feature importance measurement was introduced by Breiman (2001) for random forests. Based on this idea, Fisher, Rudin, and Dominici (2018) proposed a model-agnostic version of the feature importance and called it model reliance.

exalate-issue-sync[bot] commented 1 year ago

Neema Mashayekhi commented: [^Use eli5 for permutation imp of black box (H2O open source).pdf]

Attached is a current workaround to get permutation importance for H2O 3, using black box model approach with eli5

exalate-issue-sync[bot] commented 1 year ago

Neema Mashayekhi commented: Will be tracking the issue through this JIRA: https://h2oai.atlassian.net/browse/PUBDEV-7139

h2o-ops commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-6504 Assignee: Tomas Fryda Reporter: Erin LeDell State: Resolved Fix Version: N/A Attachments: Available (Count: 1) Development PRs: N/A

Attachments From Jira

Attachment Name: Use eli5 for permutation imp of black box (H2O open source).pdf Attached By: Neema Mashayekhi File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-6504/Use eli5 for permutation imp of black box (H2O open source).pdf