jgellar / pcox

Penalized Cox regression models
1 stars 0 forks source link

Slightly off topic: estimating smooths from gam() #18

Closed jgellar closed 9 years ago

jgellar commented 9 years ago

So this is slightly off topic, but it came up in a meeting I was having today and I was wondering if you had a good solution.

I'm working with a student here on a model we came up with and are fitting with mgcv::gam(). One thing we were interested in was looking at coverage probabilities for confidence intervals created based on gam's estimate of the variance/covariance matrix. We were getting poor performance, so we went down to the simplest case we could come up with: a linear regression with a smooth effect of a scalar covariate. Coverage was still quite poor.

Here's what we did: we simulated some data according to the model $E[Y] = 20 + 3x$, but estimated it with gam(y ~ s(x)). The problem we realized is that the model is over-parameterized: since it includes an intercept, there is no way for the model to know whether some of the "intercept" is part of the "f(x)". So the estimation would be impossible, without some constraints on f. Do you know how these constraints are determined?

For the above scenario, we could fix it by just removing the intercept, or by considering the intercept to be part of f(x) and just looking at $\hat y$. Doing this does result in confidence intervals with 95\% coverage, btw. But what if the model was $\alpha + f(x) + g(z)$? Removing $\alpha$ doesn't help you untangle f vs. g. So in general, what are \hat f and \hat g trying to estimate? And in particular, how can we evaluate the standard error estimates that gam() provides? Any ideas?

Btw, in pcox we will also get this effect again, but the "intercept" is absorbed into the baseline hazard, making it even more troublesome.

fabian-s commented 9 years ago

Smooth terms f(x_p) of a covaraite x_p in mgvc are (typically) estimated subject to the constraint \sum^ni f(x{pi}) = 0, which ensures that the intercept is identifiable.

This will help re CI performance and related issues.

Also take a look at ?predict.gam, search for type-option "iterms" and argument unconditional. Don't construct the CIs from scratch yourself, use the se's for the function estimates returned by predict.gam().

Btw, in pcox we will also get this effect again, but the "intercept" is absorbed into the baseline hazard, making it even more troublesome.

i suspect that this is at least one of the reasons for #17.....

i'm closing this since it's unrelated to pcox.