Closed 1989-juliana-h closed 4 years ago
Fair question. Thanks for bringing it up. I have this package pretty torn apart behind the scenes as I'm reworking it to incorporate causality. That doesn't answer your question, but give me a day or three, and I'll code up R^2 and sigma up with a worked example in the README here. I could answer off of the top of my head, but there's a good chance that I'd mistype.
Alright. Where to start? A lot of this is a note-to-self. The code used to calculate sigma in the report and now returned from the r2()
function is spot on (testing and contributions always welcome); however, equation 7 in the paper is incorrect. Sorry for the confusion; bit of a time waster when that happens. I'll have to fix and re-upload. On that note, the formula below is the one we want.
There may be some degenerate cases that I've missed but the following should hold:
The variance between y and the model intercept/baseline (bottom left) should always be the largest. Sigma would turn negative if the model predictions were worse than an intercept-only model (not likely).
The top left--implemented as a group_by(feature)
in the package--should always be greater than or equal to the error variance on the top right--because each feature's predictive contribution is being removed in turn using it's unique Shapley effect. The general idea is that, if you look at the model-explained variance--which is what the denominator is here--if features are well and truly uncorrelated, you should be able to tear the full model down, feature by feature, and end up with the intercept-only model...giving a sigma unique of 1. If the features are highly correlated, as the full model is torn down with the Shapley predictions, well, you don't get anywhere close to the intercept-only model because each feature isn't getting much unique explanatory credit.
I'm certain that this maps onto a proof about part/partial regression coefficients because conceptually it's the same, but I don't have a reference handy.
In this example, that 40 should never go above 50 or below 10, except maybe marginally when using stochastic Shapley values.
Hello, thanks a lot for implementing the idea of decomposing the R2 Shapley decomposition in R. I've tried to understand and implement all steps you mention in your paper. The main important thing for me is the calculation of the sigma unique (due to feature correlations). However, by applying the formula in the paper I receive values higher than one. So I was wondering whether I have to replace the real y by the predicted y in the counter of the formula?