stan-dev / projpred

Projection predictive variable selection
https://mc-stan.org/projpred/
Other
110 stars 26 forks source link

Inconsistency in reference model formula between GAMs and GAMMs #384

Open fweber144 opened 1 year ago

fweber144 commented 1 year ago

There is an inconsistency in the reference model formula between GAMs and GAMMs (here shown with objects from the unit tests):

> refmods$rstanarm.gam.gauss.stdformul.with_wobs.without_offs.trad$formula
y_gam_gauss ~ xco.1 + xco.2 + xco.3 + xca.1 + xca.2 + s(s.1)
> refmods$rstanarm.gamm.gauss.stdformul.with_wobs.without_offs.trad$formula
y_gamm_gauss ~ xco.1 + xco.2 + xco.3 + xca.1 + xca.2 + s.1 + 
    s(s.1) + (xco.1 | z.1)

so the GAM lacks the extra s.1 term. This is probably related to

> formula.gamm4(fits$rstanarm.gam.gauss.stdformul.with_wobs.without_offs)
y_gam_gauss ~ xco.1 + xco.2 + xco.3 + xca.1 + xca.2 + s(s.1)
> formula.gamm4(fits$rstanarm.gamm.gauss.stdformul.with_wobs.without_offs)
y_gamm_gauss ~ xco.1 + xco.2 + xco.3 + xca.1 + xca.2 + s.1 + 
    s(s.1) + (xco.1 | z.1)

However, this doesn't seem to have any consequences for the candidate models in the forward search: First running

debug(search_forward)
vs_gam <- varsel(refmods$rstanarm.gam.gauss.stdformul.with_wobs.without_offs.trad,
                 nclusters = 1, nclusters_pred = 1)

and then debugging search_forward() until size 2 gives

> cands
[1] "s(s.1)" "xco.1"  "xco.2"  "xco.3"  "xca.1"  "xca.2"  "s.1"

so both s.1 and s(s.1) are considered as candidates (as desired). Similarly, for the GAMM, we get from

vs_gamm <- varsel(refmods$rstanarm.gamm.gauss.stdformul.with_wobs.without_offs.trad,
                  nclusters = 1, nclusters_pred = 1)

and debugging search_forward() until size 2:

> cands
[1] "s(s.1)"    "(1 | z.1)" "xco.1"     "xco.2"     "xco.3"     "xca.1"     "xca.2"     "s.1"      

so again, both s.1 and s(s.1) are considered as candidates (as desired).