Open DemGrg opened 6 years ago
Yes, actually there is a pretty cool reason why you do not want to have beta*value as separate contributions (see below)
In the broken
object values that are calculated as beta*centered(value)
This is to make contributions resistant to shifting of an X variables. Like you will get same brokenDown plots despite having temperature in celsius of fahrenheits. Beta coefficients take care about scale, but location needs to be done separately. Also, since values are centered, the intercept is shifted as well.
It is easy to get such individual contributions.
The way how this is implemented in the breakDown
package is through (no extra calculations are needed)
predict.lm(model, newdata, type = "terms")
Thank you for the explanation! May I suggest giving the user the option to use the centered or regular x values, as well as providing some explanation in the documentation. This is a great chart, but confusing without any explanation of using type = "terms"
Yes, some documentation is required. Winter semester has just ended so I will have some time to work on it.
dear @pbiecek
Following @alathrop it would be great to have an option for having directly the application of the different terms rather than the centered values.
I completely understand for point of view. But in other context, such plot would be relevant, e.g. for pedagogic purpose. When teaching, I often need to explain to my students how a single prediction is obtained from a model, in particular when explaining how to interpret interactions.
Thanks for this package
Maybe some code could be helpful. I have tried the following.
betas <- function (object, newdata)
{
tt <- terms(object)
Terms <- delete.response(tt)
mm <- model.matrix(Terms, newdata)
ass <- attr(mm, "assign")
tl <- attr(Terms, "term.labels")
co <- coef(object)
pred <- co * mm
ret <- matrix(rep_len(NA, length.out = length(tl) * nrow(newdata)), nrow = nrow(newdata))
colnames(ret) <- tl
rownames(ret) <- rownames(ret)
for (i in 1:length(tl)) {
ret[, i] <- rowSums(pred[, ass == i, drop = FALSE], na.rm = TRUE)
}
attr(ret, "constant") <- rowSums(pred[, ass == 0, drop = FALSE], na.rm = TRUE)
ret
}
At the beginning of broken.glm
, simply use ny <- betas(model, new_observation)
instead of predict
and all the rest of the function will still be working.
Would you consider adding such options?
I have prepared a pull request, just in case
Thanks, merged. Rendered examples are here: https://pbiecek.github.io/breakDown/reference/broken.lm.html
thanks
Hi, I don't understand how the broken function calculates the coefficients? (or something is off?)
In the lm function this is my test result:
Call: lm(formula = TotalCharges ~ ., data = data_in_test)
Residuals: Min 1Q Median 3Q Max -1943.33 -453.71 -94.64 490.26 1887.26
Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) -2162.4583 21.9717 -98.420 < 2e-16 MonthlyCharges 36.1234 0.3080 117.301 < 2e-16 tenure 65.3606 0.3683 177.476 < 2e-16 SeniorCitizen -86.7050 24.3449 -3.562 0.000371
Test user: -2162.4583 + (data_in_test[analysed_user,]$MonthlyCharges 36.1234) + data_in_test[analysed_user,]$tenure65.3606 + data_in_test[analysed_user,]$SeniorCitizen*(-86.7050)
[1] 721.2045
While you get: (u can see that the intercept is different)
Obviously one would expect that contributions of a waterfall plot would be simply Y=intercept + beta*value ... etc. from the summary output?