Partial residuals on wrong scale with Tweedie distribution in mgcv

pbreheny / visreg

Visualization of regression functions

http://pbreheny.github.io/visreg/

61 stars 18 forks source link

Partial residuals on wrong scale with Tweedie distribution in mgcv #51

Open lukembrowne opened 6 years ago

lukembrowne commented 6 years ago

Thanks for your work on visreg. It's been really useful

I've run into a weird issue when trying to visualize results from a gam from the mgcv package, specifically when using the Tweedie distribution, which has a log link function. It seems the y axis for the partial residuals is off, producing estimates that are way beyond the data. Here's a reproducible example


library(mgcv)
library(visreg)

dat <- runif(0, 1000, n = 1000)
type <- factor(rep(c(1,2,3,4,5), 200))

gam <- bam(dat ~ s(type, bs = "re"), family = "tw")

visreg(gam)

# Changing to the scale to response shows how off the partial residuals are
visreg(gam, scale = "response", partial = TRUE)

pbreheny commented 6 years ago

This is a common question. The issue is that partial residuals are calculated and are meaningful on the scale of linear predictors. Once you transform them, I really can't guarantee that they have any meaning. Because they are potentially misleading, they are explicitly turned off in visreg when you transform the response.

You can read more about it in this thread if you would like.

pbreheny commented 6 years ago

I will say, though, that your example is interesting; it's compelling that even in such a simple case, the transformed residuals are so far off from the original data.

One thing that's on my to-do list for visreg is to allow users to pass options to the residuals function for the model object; perhaps another type of residual would work better on the transformed scale. Here's an inelegant workaround for your example:

residuals.bam <- function(fit) residuals.gam(fit, type="scaled.pearson")
visreg(gam, scale = "response", partial = TRUE)

At first glance, this seems to give more reasonable-looking results, although again, I think there's a fundamental issue here that the model is simply not being fit on the original scale, so partial residuals aren't well-defined.

pbreheny commented 6 years ago

You know what, maybe I'll re-open this issue.

Certainly, it would be nice to have the option to choose residual types with something like

visreg(gam, scale = "response", partial = TRUE, resid.args=list(type="scaled.pearson"))

But beyond that, maybe it would nice to have a completely different approach for plotting residuals on the response scale and use raw residuals rather than transforming residuals from the linear predictor scale. My position in the past has been that if you want to learn about model fit by looking at residuals, you should be looking on the linear predictor scale, not the transformed scale, but perhaps there is some value in looking at raw residuals on the response scale...I don't know; I'll have to think about this.

lukembrowne commented 6 years ago

Thank you for taking the time to look into it. I see now the issue with plotting the partial residuals on a scale they were not intended for, and the default behavior with visreg makes sense.

The option to add in residual type would be great!