Partial residuals with transformed response

pbreheny commented 10 years ago

(This conversation took place on e-mail, but I'm reproducing it here because it's a common question)

Santiago Benitez-Vieyra (Universidad Nacional de Córdoba):

I think I found a bug when using visreg with poisson or quasipoisson distributions, both in glm and gam. With binomial distribution it works fine

library(visreg)
utils::data(Insurance, package="MASS")
fit<-glm(Claims ~ Holders, family = quasipoisson, data = Insurance)
visreg(fit, scale="response")

# compare the y scale with
plot(Insurance$Holders, predict(fit, type="response"))

pbreheny commented 10 years ago

Hi Santiago,

No, visreg is doing what it is supposed to do. If you submit

visreg(fit)

You can see that there are some large outliers; when you transform to the response scale (i.e., exponentiate), the largest outlier dominates the visual scale.

You can see that visreg is displaying the correct fit by:

visreg(fit, scale="response", ylim=c(0,1000))

or

visreg(fit, scale="response", partial=FALSE)

Yes, these plots look prettier, but the default is how it should be, alerting the modeler to the fact that his model is a horrible fit to the data.

pbreheny commented 10 years ago

Hi Patrick, I'm still a bit confused, because there are not observed values above 400 in the Insurance data, thus I did not expect residuals higher than 400 in the response scale. Then I noticed that visreg made the residual plot using something similar to

plot(Insurance$Holders, exp(predict(fit) + resid(fit)))

Is that correct?

thanks, Santiago

pbreheny commented 10 years ago

Hi Santiago,

So the more I think about it, this is actually a pretty interesting issue, so I'll give a fairly in-depth response:

1) First of all, yes:

plot(Insurance$Holders, exp(predict(fit) + resid(fit)))

is essentially what visreg is doing here. This might seem strange in this particular case, since we could just plot Holders vs. Claims. However, in general, the idea of plotting Holders vs. Claims cannot be extended to the multiple regression case. The idea is that we are plotting the relationship between y and x while keeping all the other variables constant -- in the actual observations, however, those other variables are changing, so plotting the observations directly would be misleading. The only way to extend the idea to multiple regression is using model-based adjustments like transform(predict+residual).

2) To see why there's this extreme outlier, consider the regular Poisson model:

fit<-glm(Claims ~ Holders, family = quasipoisson, data = Insurance)

If you look at observation 12, the model predicts on the basis of the number of holders for that observation that there should be 113 claims. In fact, there are 233 claims. This is way beyond what is possible with a Poisson distribution: the probability of having an observation that extreme is 3x10^-23, or about 1 in 30 sextillion. So to me, a partial residual plot like you get from visreg should look pretty crazy and unbelievable, because the model is obviously not compatible with the data.

3) However, what if we fit a Quasipoisson model? On the surface of things, this should help out with item #2: the earlier model said that observation 12 should follow a distribution with mean 113 and standard deviation 11, which obviously didn't work, but if we add a dispersion parameter, the distribution now has standard deviation 45, which is a bit more compatible with the observation of 233. However, it's hard to make this notion rigorous, since the Quasipoisson isn't actually a real distribution (i.e., you can't calculate the probability of seeing a point as extreme as 233).

Furthermore, the residuals for a Quasipoisson model are exactly the same as the residuals for a Poisson model. So, in light of 1, you get the exact same plot. I agree that this is strange and doesn't seem like it should be the case, but I am not in charge of defining residuals for quasilikelihood models -- I am just using the residuals as calculated by residuals.glm. I think the take-home message is that residuals aren't really all that well-defined for a quasilikelihood model, because residuals are based on distributions and quasipoisson is not an actual distribution. There may be something ad-hoc you could do in visreg to make quasipoisson look better, but on the other hand, there are just fundamental limitations to quasilikelihood models.

4) To see that this all works better when we have real distributions, we can check what we get with a negative binomial model (which, like the quasipoisson, is a count model that allows overdispersion, but is based on an actual distribution). The visreg plots you get here look much more reasonable:

require(MASS)
fit.nb <- glm.nb(Claims ~ Holders, data = Insurance)
visreg(fit.nb, scale="response", ylim=c(0,1000))

Note that the plot still don't look good, in the sense that the regression function blows up for Holders > 2000; the relationship is obviously not linear on the log scale and no model based on this is going to fit the data well. However, the residuals themselves look good and have the right scale -- no residuals at 120000 or anything like that (you can remove the ylim to verify this).

So anyway, I'm curious what you think of 1-4 above. If you have any thoughts, I'd be interested to hear them.

pbreheny commented 10 years ago

Hi Patrick, it is really an interesting issue. Taking aside the problem of Poisson, I think that my problem with this graph comes from the difference between partial and common residuals. In a linear model like

y = a + b1_x1 + b2_x2 + e

I think that a partial residual is

p.e = e + a + b1x1

and a partial residual plot should be

plot(x1, pe)

If instead of having a multiple regression model, we have a simple model, I expect that partial residuals equals observed values. I think that visreg has this behavior, because in a common linear model

fit3 <- lm(Claims ~ Holders, data = Insurance)
visreg(fit3)

is the same than

plot(Insurance$Holders, fit3$fitted)
points(Insurance$Holders, fit3$fitted+fit3$residuals)

The problem with generalized models is that I don't know how to use the link function because exp(a + b1*x1 + e) does not equal the observed values if e are deviance or pearson residuals. You can see it using

fit<-glm(Claims ~ Holders, family = quasipoisson, data = Insurance)
plot(Insurance$Claims, exp(predict(fit) + resid(fit)))

(Using trail and error, I discovered that it works using "response" without exp. However, I'm not sure what "response" residuals are...)

fit<-glm(Claims ~ Holders, family = quasipoisson, data = Insurance)
plot(Insurance$Claims, exp(predict(fit)) + resid(fit, type ="response"))

Comming back to the insurance data, I think it is useful that visreg informs about extreme outlyers. However, using exp(predict + resid), only extreme positive values of the residuals are being highlighted.

fit<-glm(Claims ~ Holders, family = quasipoisson, data = Insurance)
visreg(fit, scale="response")
points(Insurance$Holders[12], exp(predict(fit)[12]+resid(fit)[12])) #extreme positive residual
points(Insurance$Holders[8], exp(predict(fit)[8]+resid(fit)[8]), col = "red") #extreme negative residual

In summary I think the whole problem is how to use the link function (especially log/exp) in this cases.

regards, Santiago

pbreheny commented 10 years ago

Hi Santiago,

The root of the problem is that residuals are constructed for and make sense on the scale of the linear predictors, not the outcome scale. This raises the question, "What should visreg plot if a user requests a plot on the scale of the original response?" My original thinking was to plot the linear-predictor residuals, transformed onto the scale of the response, which is what visreg currently does. I still think that this is the best thing to plot if you have to plot partial residuals on a response-scale plot. However, our conversation is causing me to think that plotting these residuals is doing more harm than good.

If a modeler wants to inspect their residuals, they should do so on the scale of the linear predictor, always. I can't imagine any situation in which inspecting residuals on the scale of the response would be better than inspecting them on the scale of the linear predictor, and often, as in the case below, with log/exp scales, looking at residuals on the response scale is clearly problematic.

So with that in mind, I've decided to change the default options in visreg. For the example below, the code

fit<-glm(Claims ~ Holders, family = quasipoisson, data = Insurance)
visreg(fit, scale="response")

will no longer plot partial residuals. The output will look like what you get from:

visreg(fit, scale="response", partial=FALSE, rug=TRUE)

You can still get the partial residuals if you want them, but you would have to explicitly specify partial=TRUE. It's questionable whether they are useful in this situation -- they have their flaws, but I think any notion of partial residual on this scale is problematic -- but in my experience can be helpful to look at. However, for the sake of inspecting residuals and model fit,

visreg(fit)

which illustrates the fit, with partial residuals, on the scale of the linear predictor, is by far the best approach.

(NOTE: These changes took effect in visreg 2.0-6)

pbreheny commented 10 years ago

Hi Patrick. I agree that residuals are informative in the link scale, hence your desition is the best, and I look forward to the new version of visreg.

regards, Santiago

pbreheny / visreg

Partial residuals with transformed response #1