interpretml / interpret

Fit interpretable models. Explain blackbox machine learning.
https://interpret.ml/docs
MIT License
6.22k stars 726 forks source link

Mean Absolute Score : Overall Importance #337

Closed Tejamr closed 2 years ago

Tejamr commented 2 years ago

Hi , Can I know how Mean absolute score for each feature(Feature Importance) is calculated in EBM global explanations? Is there any particular metric or mathematical formula to calculate those probabilities? If, yes please let me know . Please check below png . May I know how those values are calculated . How Glucose has got more score irrespective of others ? newplot

interpret-ml commented 2 years ago

Hi @Tejamr -- Those values are calculated by averaging across all samples the absolute value of the score in ebm.additiveterms per feature. The code where this is done is located here:

https://github.com/interpretml/interpret/blob/7033c188db914be53e39a519050f8d1a77fb57d8/python/interpret-core/interpret/glassbox/ebm/ebm.py#L1153

-InterpretML team

Tejamr commented 2 years ago

Can you please explain by taking any dataset or values as example?How the probability values are coming?I am not able to crack the source code..

Harsha-Nori commented 2 years ago

Hi @Tejamr,

I think the simplest setting is a boolean feature with a regression dataset. Let's say we're predicting housing price, and the feature of interest is "A/C" (so 1 means the house has air conditioning, 0 means it doesn't). In the training dataset, we have 1,000 houses with A/C and 2,000 without A/C. Let's also assume the model learns -$10,000 for no A/C and +$15,000 for A/C houses.

So to summarize:

Counts: {No A/C: 2,000, A/C: 1,000} Model Scores: {No A/C: -$10,000, A/C: +$15,000}

These values are the same ones you can see for the feature when you call show(ebm.explain_global()) -- the counts are in the density graph at the bottom, and the model scores are in the main figure. Make sure you are inspecting the learned functions for your features too by using the dropdown in explain_global()!

The overall feature importance for the "A/C?" feature is simply the average absolute contribution that feature makes on each sample in the training set. So in this case, it would be:

(2,000 abs(-$10,000) + 1,000 abs($15,000)) / (2,000 + 1,000)

(2,000 $10,000 + 1,000 $15,000) / (2,000 + 1,000)

35,000,000 / 3,000 = $11,666

The way to interpret this is that on average, the "A/C?" feature moved predictions in some direction by $11,666. Intuitively this makes sense too -- for every sample, it either reduced the price by $10,000 or increased it by $15,000, so the average importance must be somewhere in between. In this case, it's closer to $10,000 because the majority of the houses do not have A/C.

It's the exact same procedure for classification, only the scores are in log odds instead of in the units of the target. For continuous features or categorical features with more than 2 attributes, it's the same procedure again, except we have more terms in the weighted average calculation (one for each bin of data in the feature).

Note that this isn't the only way of calculating feature importance, and none of them are perfect. For example, this method will not highlight features that learn incredibly strong effects that effect a tiny number of samples. Other metrics can be the max() - min() of the graph, the change in RMSE from removing a feature, etc. -- we just picked one that seemed reasonably broadly applicable. Happy to clarify further if you have any other questions!

Tejamr commented 2 years ago

Hi @Harsha-Nori , @interpret-ml

Thanks for your detailed explanation . But still a bit confusing for me regarding mathematical calculations of Model scores . I am working on PIMA Diabetes dataset(Kaggle) and here is the dataset . I have applied EBM on this dataset and got global explanations as below .

Can u please explain me the detailed calculations like how probability scores are calculated for each feature . Ex : How Glucose has got more score regards of other features . When I plot correlation Pregnancies has more impact on the dependant feature but in EBM Global explanations pedigree function contributing more , How this is happening?

A detailed mathematical calculations will be more helpful as I am supposed to submit POC .

newplot (1)

diabetes.csv

Harsha-Nori commented 2 years ago

For a more mathematical explanation:

An EBM is a generalized additive model, which takes the form:

assuming you trained the model on N datapoints with K features each. g is a link function that adapts the model to different settings like classification or regression, beta is an intercept, and each f() is a function that's learned by the model which operates on feature x_k (one for each of the K features in the model).

You can think of this like a generalized linear model, except EBMs are allowed to learn a function of each feature instead of a coefficient for each feature. When the model makes predictions, just like in linear models, we get one contribution per feature, sum them up, and pass them through the link function to get a final answer.

In classification, the link function is the logit link, which means that the units of the overall feature importances and each shape function are in log odds (just like the coefficients in logistic regression). This is because log odds are additive, but probabilities aren't. Many classification models, including logistic regression, other boosted trees, neural nets, etc. operate in this way, because directly learning in probability space is often infeasible. You can turn log odds into probabilities by using the logistic function, which we use internally as well.

The feature importance for a specific feature "k" is then calculated as:

All this means is that we take all of the data in the training set, select only the column of data for the feature we're interested in, pass it through the shape function that EBM learned (which you can visualize with the dropdown in ebm.explain_global(), take the absolute value, and average everything to get the final answer.

We do this calculation once for each feature independently, and then just sort the final scores we get for the visualization.

Our calculation is identical to what other packages like SHAP do -- you may also find this description of aggregate feature importance from the Interpretable Machine Learning book to be helpful: https://christophm.github.io/interpretable-ml-book/shap.html#shap-feature-importance


To answer this question:

Ex : How Glucose has got more score regards of other features . When I plot correlation Pregnancies has more impact on the dependant feature but in EBM Global explanations pedigree function contributing more , How this is happening?

EBMs use more than simple correlation metrics to build the shape functions for each feature. We're learning functions in parallel on all the features at the same time and can make opinionated decisions about how to allocate credit to each feature. While your pregnancy feature may be more (linearly) correlated to your dependent feature, EBMs may have chosen to fit the dependent variable using other features more heavily. This can happen when your features have correlation between themselves too, which is almost always true in practice. It could also be that "Pregnancies" being non-zero is relatively rare in your dataset, which means that it may not rank highly on the overall feature importance measure even though it's an important feature for the subset of the population that is pregnant.

Hope this helps!

Tejamr commented 2 years ago

Hi @Harsha-Nori ,@interpret-ml ,

Still same confusion fr me . I am not able to get the probability values as I am a noob . Can u please calculate those values and share the notebook or handwritten paper here. I mean I need Calculations from the scratch like: 1:How additive terms are calculated? 2.How probability values are calculated? 3.How values are passed to the shape function and how they will be converted to logit link functions? 4.How we get each and every value .I am asking like internal structure of EBM with mathematical calculations on diabetes dataset .

Heres the dataset for your reference .

Note : please do calculations on Diabetes dataset and share the code or paper here .

diabetes.csv

I am not at all getting the single point also regarding calculations . It will be a great beneficiary for me if u help me in this .

Tejamr commented 2 years ago

Hi @Harsha-Nori ,@Interpret-ml

Can u pls explain what is the shape function here used . How should I pass those values to the shape function .

Note : All this means is that we take all of the data in the training set, select only the column of data for the feature we're interested in, pass it through the shape function that EBM learned (which you can visualize with the dropdown in ebm.explain_global(), take the absolute value, and average everything to get the final answer.

*For the above explanation I am asking ...

Harsha-Nori commented 2 years ago

Hi @Tejamr,

No worries. I highly recommend watching the first ~20 minutes of this video: https://www.youtube.com/watch?v=2YKtNYBuojE where Rich Caruana explains what EBMs and GAMs are, how we use boosted decision trees to learn the graphs, and how we make a single prediction with a graph.

This chapter on GAMs from the interpretable machine learning book might be helpful too: https://christophm.github.io/interpretable-ml-book/extend-lm.html . I truly think the explanations above will make more sense to you once you've understood how GAMs and other additive models work!

Tejamr commented 2 years ago

Hi @Harsha-Nori ,

Thank you for your detailed explanation . But, can I know how we are getting additive terms per each feature . Ex : ebm.additive_termsper feature above line is giving a set of arrays with some bunch of point values . When I gone through the code it was like averaging the absolute additive terms will give the overall model feature importances(Global).Can I know how those additive terms are coming . Is there any particular method or mathematical calculation behind that?If yes, please let me know .

Tejamr commented 2 years ago

Hi @Harsha-Nori ,

In the above article u have mentioned like we have to take a single column value and pass it through the shape function which we have defined . U have defined a formula also .I am using EBM algorithm can I know what is the shape function for this and what is the formula for the shape function we have to pass .Kindly allow me to know abt these statistics .