Open tomwenseleers opened 6 years ago
Hi, thanks for this note.
I'm not familiar with scams but I have thought a little bit about adding a method for GAMs. The recipe would be more difficult than what we currently are doing with GLMs, and there isn't anyway around it. In the GLM case, it's relatively easy to form interval estimates since we know how to calculate standard errors of the regression coefs, then those uncertainties can be propagated linearly to form intervals estimates at the linear level. The intervals on the linear level can then be transformed to the response level through the inverse link function. We wrote about this idea in the GLM vignette that is distributed with the package.
In the GAMs case, we can't take advantage of linearity of propagate uncertainty from the coefs to the predictions. There is some info throughout the Hastie and Tibshirani book on standard errors in GAMs and I found that predict.gam can take an se.fit argument, so we could reasonably implement an add_ci method for GAMs just based on that function. An add_pi method might be out of the question? My gut says we would have to write a parametric bootstrap function to generate data from the regression model in order to calculate prediction intervals, regression quantiles, etc. I'm not sure if there is anything in the literature about this!
Ha scam inherits from gam, so I think the methods for both would be the same. I would indeed use the inbuilt se.fit argument to estimate standard errors for these models! Maybe relevant for calculating confidence intervals: https://stats.stackexchange.com/questions/33327/confidence-interval-for-gam-model/33328
For prediction intervals for GAMs I found this on public threads: https://stat.ethz.ch/pipermail/r-help/2011-April/275632.html https://stackoverflow.com/questions/18909234/coding-a-prediction-interval-from-a-generalized-additive-model-with-a-very-large
Good digging!
I'll take a look and these then see what I can do when work on GLMMs is wrapped up.
Oh yes and I also remember that Ben Bolker outlined a method that he referred to as "population prediction intervals" in a Chapter of his book, http://ms.mcmaster.ca/~bolker/emdbook/chap7A.pdf, which applies to any fitting method that provides SEs on the fitted coefficients. See here for an application to a nonlinear mixed model : https://stats.stackexchange.com/questions/231074/confidence-intervals-on-predictions-for-a-non-linear-mixed-model-nlme Only thing I was never quite sure about is whether these are confidence intervals on the mean (I think that's what they are - but this makes the term population prediction intervals a little confusing), or prediction intervals on future predicted individual data points. Perhaps the method could be adapted to calculate both depending on whether the residual (observation-level) variance would be included or not? Also not sure how one should deal with the variation from the random effects part in the case of mixed effects models [including the uncertainty in the variance parameters for the random effect factors I guess, like they do here https://cran.r-project.org/web/packages/merTools/vignettes/Using_predictInterval.html]. Maybe useful for you if you wanted to calculated confidence and prediction intervals for the trickier nonlinear model cases?
Regarding the inclusion of residual variance: We wrote about this in the mixed model vignette in ciTools. It's a bit more complicated that just including or excluding this residual variance depending on the type of interval desired.
Our method differs from what is implemented in the merTools package. At one point we considered using the merTools package to form some of our interval estimates, but after a simulation study I concluded that it does not adequately control the empirical coverage probability at the nominal level (in the examples I tried). IIRC it is way too conservative.
Was also wondering if it would be hard to also support mgcv::gam (generalized additive models) and scam (shape constrained additive models) models? The recipe to calculate confidence and prediction intervals for these should be quite similar to that for GLMs right?