Closed exalate-issue-sync[bot] closed 1 year ago
Neema Mashayekhi commented: [^Use eli5 for permutation imp of black box (H2O open source).pdf]
Attached is a current workaround to get permutation importance for H2O 3, using black box model approach with eli5
Neema Mashayekhi commented: Will be tracking the issue through this JIRA: https://h2oai.atlassian.net/browse/PUBDEV-7139
JIRA Issue Migration Info
Jira Issue: PUBDEV-6504 Assignee: Tomas Fryda Reporter: Erin LeDell State: Resolved Fix Version: N/A Attachments: Available (Count: 1) Development PRs: N/A
Attachments From Jira
Attachment Name: Use eli5 for permutation imp of black box (H2O open source).pdf Attached By: Neema Mashayekhi File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-6504/Use eli5 for permutation imp of black box (H2O open source).pdf
Permutation feature importance is a great way to get feature importance in a model-agnostic fashion. All our algorithms (except Stacked Ensemble at the moment) have built-in feature importance, but it would be great to have this feature. It makes sense to have it as separate function which does not happen automatically as part of the model building process. This can also be used as a new method for doing metalearning (model selection) inside a Stacked Ensemble.
Here is the methodology:
A) you have a hold-out dataset (or you use the kfold) B) You make predictions using the ensemble model and you measure AUC or whichever other metric ( you have already computed these things with the leaderboard).Lets say this gives 0.8 AUC C) For each column in the data.
you randomly shuffle it
you repeat the scoring where you have that column as random (and everything else is correct)
you measure AUC . Now lets say AUC is 0.7. The different between the original AUC and this one (where one feature is wrong) is the importance of that column
you bring this column back to normal and your repeat for the next column
References: https://christophm.github.io/interpretable-ml-book/feature-importance.html The permutation feature importance measurement was introduced by Breiman (2001) for random forests. Based on this idea, Fisher, Rudin, and Dominici (2018) proposed a model-agnostic version of the feature importance and called it model reliance.