Observing Alternative Feature Importance Techniques

BradKML commented 2 years ago

TL;DR Original List with yet-to-be implemented FE algorithms in https://github.com/parrt/random-forest-importances/issues/54

Seeing https://github.com/interpretml/interpret/issues/364 and https://github.com/interpretml/interpret/issues/218 I do notice that some of the feature importance algorithms are not on the list, particularly LOFO, Morris and "Unbiased" feature importance. Might wanna check those out?

Bonus: this visualization notebook exists https://github.com/shionhonda/feature-importance

Currently these are not in the ReadME:

LOFO https://github.com/aerdem4/lofo-importance
Permutation https://github.com/nestordemeure/permutationImportance
"Unbiased" https://github.com/ZhengzeZhou/unbiased-feature-importance

Unsure if they have alternative name:

~~LIME being similar to SHAP https://github.com/marcotcr/lime~~
Gini https://github.com/shionhonda/feature-importance
Split https://github.com/shionhonda/feature-importance

How are they different (missing data test vs prioritization, significant correlation)?

BradKML commented 1 year ago

For some descriptions:

LOFO: (a) Get baseline performance with all features withCV, (b) Remove singular feature, retrain the model and evaluate CV performance (c) measure new performance against baseline (note: LightGBM is default model and cv=4)
Permutation: shuffling the value of the column to see if the performance has changed (noted to be slow) https://christophm.github.io/interpretable-ml-book/feature-importance.html
UFI: Not sure about how they rely on decision path in a RandomForest model to make the decision.
Gini and Split: uses the LGBMRegressor internal function in LightGBM, simple enough

paulbkoch commented 1 year ago

Hi @BrandonKMLee -- Thanks for putting together this list and the descriptions. We'd be open to PRs that implement these alternative algorithms. Our core team is pretty focused on improving EBMs, so we don't have a lot of bandwidth to work on more tangential improvements.

BradKML commented 2 months ago

@paulbkoch noted with thanks regarding the priority, and I also remember how booster-based feature selection was being heavily focused on by everyone https://github.com/scikit-learn-contrib/boruta_py https://github.com/Ekeany/Boruta-Shap https://github.com/chasedehan/BoostARoota Also some other small finds regarding MRMR (mutual information, not sure if it overlaps to other methods here) https://github.com/AutoViML/featurewiz https://github.com/smazzanti/mrmr https://github.com/danielhomola/mifs

P.S. There are other super-repos for feature importance https://github.com/JingweiToo/Wrapper-Feature-Selection-Toolbox https://github.com/jundongl/scikit-feature

interpretml / interpret

Observing Alternative Feature Importance Techniques #374