Closed alexhallam closed 5 years ago
Good question, let me use an easier dataset with just one binary variable.
The true model is y = x + rnorm() where x is a binary variable
binary <- rbinom(1000,1,0.5)
y <- binary + rnorm(1000)
the fitted model is y = 0.019 + 0.966 * binary
In break down effects are relative to average model response,
the average is 0.5, and for binary = 1
, the model response would be close to 1, thus contribution of binary=1
is +0.5
For binary = 0
, the model response would be close to 0, thus contribution of binary=0
is -0.5
In your case, the contribution -0.146
for versicolor = 0
means that versicolor = 0
is smaller than average effect for versicolor
(since it's binary variable it's average from two values, but it does not matter).
Btw: you can use the Species
variable in the model,
fit <- lm(Sepal.Length ~Sepal.Width + Petal.Length + Species, data = iris)
no <- iris[1,]
br <- broken(fit, no)
plot(br)
Now it is easier to see that Species = virginica
results in lower predictions, but this negative effect of virginica is partially visible because versicolor = 0
and setosa = 0
.
Hope it helps
I think using Species
in the model results in a plots which makes much more sense. Thank you!
I am having difficulty with interpretation in the situation where the outcome is continuous and the predictors are Boolean.
If I were to think of effects in terms of a linear model I would turn coefficients on or off depending on whether the predictors were a 1 or a 0. Breakdown does not seem to do this.
In the example below I have chosen a point that has 0s assigned to the predictors. Again, in a linear model prediction this would simply result in setting the coefficients of these predictors to 0. With breakdown I am seeing a negative effect for
versicolor
andsetosa
.How am I supposed to interpret the output in this situation?
Is there a way to show that since these values are 0 for this observation that they are not contributing to the final prediction?
Created on 2018-10-19 by the reprex package (v0.2.0).