stan-dev / projpred

Projection predictive variable selection
https://mc-stan.org/projpred/
Other
110 stars 25 forks source link

`as.matrix.projection()` for GAMs: Duplicated linear term #150

Open fweber144 opened 3 years ago

fweber144 commented 3 years ago

On branch develop (commit 96d8e11d6f0ad0a1f7836c51700cb676b364bee5), this reprex:

options(mc.cores = parallel::detectCores(logical = FALSE))
data("kidiq", package = "rstanarm")
kidiq <- head(kidiq, 45)
fit_gauss <- rstanarm::stan_gamm4(
  kid_score ~ s(mom_iq),
  data = kidiq,
  iter = 500,
  seed = 734572
)

library(projpred)
prj <- project(fit_gauss, solution_terms = c("mom_iq", "s(mom_iq)"),
               ndraws = 25)
colnames(as.matrix(prj))

gives

 [1] "b_Intercept"   "b_mom_iq"      "b_s(mom_iq).1" "b_s(mom_iq).2"
 [5] "b_s(mom_iq).3" "b_s(mom_iq).4" "b_s(mom_iq).5" "b_s(mom_iq).6"
 [9] "b_s(mom_iq).7" "b_s(mom_iq).8" "b_s(mom_iq).9" "sigma"

(so the output matrix includes a linear term for the smoothed predictor) whereas

colnames(as.matrix(fit_gauss))

(correctly) gives

 [1] "(Intercept)"           "s(mom_iq).1"          
 [3] "s(mom_iq).2"           "s(mom_iq).3"          
 [5] "s(mom_iq).4"           "s(mom_iq).5"          
 [7] "s(mom_iq).6"           "s(mom_iq).7"          
 [9] "s(mom_iq).8"           "s(mom_iq).9"          
[11] "sigma"                 "smooth_sd[s(mom_iq)1]"
[13] "smooth_sd[s(mom_iq)2]"

Perhaps the problem is not really with as.matrix.projection(), but rather with project() which would need to project onto the submodel with the smoothed term only (and not an additional linear term), even if solution_terms = c("mom_iq", "s(mom_iq)")?

There's another issue with as.matrix.projection() for GAMs visible here (the smooth_sd parameters) which I'll open a new issue for.

fweber144 commented 3 years ago

There's another issue with as.matrix.projection() for GAMs visible here (the smooth_sd parameters) which I'll open a new issue for.

This is issue #151 now.

AlejandroCatalina commented 3 years ago

This has to do with the way we split the original fit object's formula to do the forward search. We defined some conventions that seemed reasonable but they may give raise to stuff like this. In this case, for instance, it was decided that we also want to project the main effect of the smoothing term, which serves as a way of asking "do I need the smoothing term or is the linear term enough in terms of predictive accuracy?".

fweber144 commented 3 years ago

In this case, for instance, it was decided that we also want to project the main effect of the smoothing term, which serves as a way of asking "do I need the smoothing term or is the linear term enough in terms of predictive accuracy?".

That definitely makes sense, but I think we need to differentiate between the (forward) search and the projection onto a pre-specified submodel. My concern here is the latter.

In any case, a submodel shouldn't contain both, the nonsmoothed term (e.g. mom_iq) as well as the smoothed term (e.g. s(mom_iq)), right? For example, if you fit a model with both of these terms in rstanarm, you'll get a lot of convergence issues. So including both terms leads to a degenerate model. Therefore, I think projpred should catch that case (when projecting onto a pre-specified submodel) and either

EDIT: An earlier version referred to mgcv where fitting a model with a nonsmoothed as well as a smoothed term would give 0 reference degrees of freedom for the smoothed term and an F-statistic as well as a p-value which are NaN. However, that doesn't seem to be necessarily the case, see e.g.:

library(mgcv)
set.seed(2)
dat <- gamSim(1, n = 40)
b <- gam(y ~ x2 + s(x2), data = dat)
summary(b)

When debugging the projection of the first draw in the reprex above, however, we do get 0 reference degrees of freedom for the smoothed term and an F-statistic as well as a p-value which are NaN.

AlejandroCatalina commented 3 years ago

As far as I remember if the smooth is included first the linear term can’t be added, but if the linear term is added first the smooth can be included later. So a model with both could exist and from the modeling perspective it could make sense in the right model, but I agree they have to be properly constrained and so on. Probably @

Get Outlook for iOShttps://aka.ms/o0ukef


From: Frank Weber @.> Sent: Monday, July 5, 2021 6:46:39 PM To: stan-dev/projpred @.> Cc: Alejandro Catalina @.>; Comment @.> Subject: Re: [stan-dev/projpred] as.matrix.projection() for GAMs: Duplicated linear term (#150)

In this case, for instance, it was decided that we also want to project the main effect of the smoothing term, which serves as a way of asking "do I need the smoothing term or is the linear term enough in terms of predictive accuracy?".

That definitely makes sense, but I think we need to differentiate between the (forward) search and the projection onto a pre-specified submodel. My concern here is the latter.

In any case, a submodel shouldn't contain both, the nonsmoothed term (e.g. mom_iq) as well as the smoothed term (e.g. s(mom_iq)), right? For example, if you fit a model with both of these terms in rstanarm, you'll get a lot of convergence issues. And if you fit such a model in mgcv, you'll get 0 reference degrees of freedom (Ref.df in summary.gam()'s output) for the smoothed term and an F-statistic as well as a p-value which are NaN. So including both terms leads to a degenerate model. Therefore, I think projpred should catch that case (when projecting onto a pre-specified submodel) and either

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/stan-dev/projpred/issues/150#issuecomment-874203871, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABZ5FH3COG6A7T6SVMV5T4TTWHHV7ANCNFSM46Q6WWEQ.

fweber144 commented 3 years ago

The models I have tested with rstanarm and mgcv had the linear term included first, so that order didn't help.

fweber144 commented 3 years ago

I have edited the comment above.

AlejandroCatalina commented 3 years ago

Interestingly, if the linear term appears earlier and the smooth appears in solution terms that means that the projection actually performs better with both. We may need to add diagnostics that catch these potential issues in the projections.

fweber144 commented 3 years ago

Sorry, I don't understand what you mean by

the linear term appears earlier and the smooth appears in solution terms

Could you explain that?

AlejandroCatalina commented 3 years ago

Yes.

The way we coded the forward search implies that the only case where both the smooth and the linear term would appear in the solution is if the linear term is chosen first, indicating that it is a good predictor even without the smooth. Now it is possible that the smooth adds extra predictive power and therefore be chosen as part of the solution as well. I will check if this is indeed the case, that is, if including the smooth first actually removes the linear term. If this is the case, then what may happen is that the smooth may or may not be part of the optimal subset of predictors, and in this case it is fully possible for the projection including both terms to be unstable and have convergence issues, which is something we can add for improvement.

fweber144 commented 3 years ago

Yes, that's the search behavior I would have expected, although I think the linear term needs to be removed whenever the smoothed term is included, not only

if including the smooth first actually removes the linear term

Anyway, as I said above, I'm only referring to project() here in this issue, not to the search. I don't know if there are similar issues with the search, but I think it's already within project() that projpred should catch the case where the user supplies both (the linear as well as the smoothed term) to solution_terms.

avehtari commented 2 years ago

In any case, a submodel shouldn't contain both, the nonsmoothed term (e.g. mom_iq) as well as the smoothed term (e.g. s(mom_iq)), right?

It depends on whether s() term includes a linear term inside. There may have been confusion as I had thought s() would not contain the linear term, but based on this comment it does. I've also checked this at some point, but can't remember the details and the mgcv documentation doesn't say it clearly.

fweber144 commented 2 years ago

Yes, that's true. The mgcv documentation is a bit scarce with regard to the mathematical details of the smoothing terms (e.g., s()). However, ?mgcv::gam.models and the beginning of chapter 18 of "The R book" by Michael Crawley make me think that an additional linear term is not necessary when using s() (so the typical usage of s() would be s(x) and not s(x) + x).

fweber144 commented 2 years ago

The example at the beginning of section 5.2 of Wood (2006), together with the notation introduced earlier in the book, makes it clear that an additional linear term is not needed.

References:

Wood, S. N. (2006). Generalized additive models: An introduction with R (1st ed.). Chapman & Hall/CRC.