scalar-on-function regression merging

huangracer commented 9 years ago

[x] merge pfr and fgam
[x] renaming psfr
[x] allowing random effects in user-friendly way
[x] build preprocessing functions
[x] fix issues from preprocessing
[x] summary method (use summary.fgam?)
[x] merge fpcr
[ ] add fpcr crossvalidation
[x] merge peer
[ ] allow 2-D functional predictors (especially for fpc terms)
[ ] testing for functional effects
[ ] expand fpc to do FPCR-C using various FPCA methods

remarks:

sparse data handling: through preprocessing functions including fpca, interpolate, smoothing
pass preprocessing information to prediction function

@jgellar @jeff-goldsmith @swihart

huangracer commented 9 years ago

@swihart will think about testing Thoughts of @swihart on current implementation: rlrt.pfr

We assume the model that the user fits is the $H_a$ model. Then, based on a user specified option, go through the trouble of fitting $H_0$ for the desired test (the $H_0$ fit in mgcv:gam()) and then passing the necessary $H_a$ and $H_0$ stuff to RLRsim() for the p-value/test.
When rlrt.pfr was originally written, along with predict.pfr, it demanded a rewrite of pfr. Part of the rewrite of pfr was for it to subsume lpfr, the other part to return all the info we would need for a user-easy testing and predicting. As it stands now, pfr fits pfr and lpfr, and assumes that any testing will be done on the last function specified. We can only test one function at a time. For better or worse, pfr returns the X used in the gam() fit.

Thoughts of @swihart on mgcv:anova.gam

Since so much work is going into sofr to use the mgcv:gam as the fitting engine, we might wish to just use testing functions from mgcv.
However, mgcv:anova.gam approach was analyze in Scheipl and Greven's paper, however mgcv notes the following: (the Scheipl et al. recommendation is not used directly, because it only applies in the Gaussian case, and requires model refits, but it is available in package RLRsim).
One concern is we use the paraPen = D option in gam for pfr(), and this line was in the Details section of the help file for mgcv:anova.gam:
Default P-values for parameteric terms that are penalized using the paraPen argument will not be good
Running the following after Mat's first FGAM example:

mgcv::anova.gam(fit) Family: gaussian Link function: identity Formula: y ~ +te(x = X.omat, z = X.tmat, by = L.X, k = c(8, 8), bs = "ps", m = list(c(2, 3), c(2, 3))) <environment: 0x7fb2fb796060> Approximate significance of smooth terms: edf Ref.df F p-value te(X.omat,X.tmat):L.X 5.999 6.000 1.524 0.186

as compared to 1st example in pfr help file:

mgcv::anova.gam(pfr.obj.t1$fit) Family: gaussian Link function: identity Formula: Y ~ X - 1 <environment: 0x7fb2f9dde6d8> Parametric Terms: df F p-value X 152 212.2 <2e-16

Q. Is mgcv::anova.gam(pfr.obj.t3$fit, pfr.obj.t1$fit) equivalent to rlrt.pfr(pfr.obj.t3, "inclusion") A. No.

> mgcv::anova.gam(pfr.obj.t3$fit, pfr.obj.t1$fit)
Analysis of Deviance Table

Model 1: Y ~ X - 1
Model 2: Y ~ X - 1
  Resid. Df Resid. Dev    Df Deviance
1    242.31     5998.9               
2    243.36     6013.4 -1.05  -14.495

## test="F" pvalue
> mgcv::anova.gam(pfr.obj.t3$fit, pfr.obj.t1$fit, test="F")
Analysis of Deviance Table

Model 1: Y ~ X - 1
Model 2: Y ~ X - 1
  Resid. Df Resid. Dev    Df Deviance      F Pr(>F)
1    242.31     5998.9                             
2    243.36     6013.4 -1.05  -14.495 0.5576 0.4641

## test="LRT" pvalue
> mgcv::anova.gam(pfr.obj.t3$fit, pfr.obj.t1$fit, test="LRT")
Analysis of Deviance Table

Model 1: Y ~ X - 1
Model 2: Y ~ X - 1
  Resid. Df Resid. Dev    Df Deviance Pr(>Chi)
1    242.31     5998.9                        
2    243.36     6013.4 -1.05  -14.495   0.4633

> rlrt.pfr(pfr.obj.t3, "inclusion")

$p.val
[1] 0.3428

$test.stat
[1] 0.04894662

Q. Is mgcv::anova.gam(pfr.obj.t1$fit) equivalent to rlrt.pfr(pfr.obj.t1, "inclusion") A. No.

> mgcv::anova.gam(pfr.obj.t1$fit, test="F")

Family: gaussian 
Link function: identity 

Formula:
Y ~ X - 1
<environment: 0x7fb2f9dde6d8>

Parametric Terms:
   df     F p-value
X 152 212.2  <2e-16  

> rlrt.pfr(pfr.obj.t1, "inclusion")
[1] "warning: test 'confounding' is a LRT and a refit of the full (alternative) model"
[1] "was completed with method = 'ML'"
$p.val
[1] 6e-04

$test.stat
[1] 8.692346

@fabian-s

swihart commented 9 years ago

Simon himself admits p-values are crude and do not solve all the Var=0 testing problems.

source: https://stat.ethz.ch/pipermail/r-help//2012-June/315348.html

Hi Martijn,

Irrespective of the p-value, 'bam' and 'lmer' agree that the variance component for 'Placename' is practically zero. In the 'bam' output see the 'edf' for s(Placename), or for a more direct comparison call gam.vcomp(m1).

As mentioned in ?summary.gam the p-values for "re" terms are fairly crude (and certainly don't solve all the usual problems with testing variance components for equality to zero), so I would not take them too seriously for testing whether your random effect is exactly zero, when the estimates/predictions are this close to zero (the typical random effect size for Placename is about 1e-8, after all). That said, when I randomly re-shuffle Placename, so that the null hypothesis is true, then the p-value distribution does look uniform, as it should, despite some edfs even smaller than that for the original data.... So the low p-value may simply reflect the common problem that even very tiny effects are often statistically significant at high sample sizes, I suppose. Anyway, unless effects of size 1e-8 are meaningful here, I would drop the Placename term.

best, Simon

ps. I'm not sure what effect the rather heavy tails on the residuals may be having here?

fabian-s commented 9 years ago

re()- random effects implemented for fgam in 62145903ef7ab69953543306c08004c76d3645c3

philreiss commented 9 years ago

There was some discussion of incorporating FPCR terms in pfr(). This would be nice, but there are a couple of issues with that: (1) fpcr() uses vector inner products (as in Reiss and Ogden 2007, following Marx and Eilers 1999) rather than functional inner products (2) fpcr() has CV functionality that would be hard to incorporate in pfr().

For the time being, we should probably leave FPCR terms out of pfr(), and add a note in the help file for fpcr() about point (1) above.

fabian-s commented 9 years ago

coef method for lf/af/re added in https://github.com/refunders/refund/commit/4861eb66b3deda812377760dd59cc61405845ca8

jgellar commented 9 years ago

Sorry about the close/re-open, I hit the wrong button

jgellar commented 9 years ago

FYI, there is still no summary() method. summary.gam() works for lf, lf.vd, and af terms, but not when re() or peer() terms are present.

jgellar commented 9 years ago

@mwmclean I'm not sure if you got a chance to look into summary.pfr, but it turned out to be a simpler solution than we thought. The problem was that inside summary.gam (in summary.gam -> reTest -> recov), Wood calls np <- coef(object) to get the number of parameters. Since we have a pfr.coef() method this is returning the wrong thing. The simple solution is for summary.pfr() to strip off the pfr class and just call summary on the gam object. So that's what I did in efe0da0c81bdffb3f68c66c90ff6241052fdaf99.

@fabian-s There are no more errors with summary(), but when you call it on a peer model it is still printing the entire X matrix (because this had to be passed through to the basis constructor as an element of xt). Can you look into a way around this? I've tried lots of things and can't come up with a solution. This is the issue I referenced in #38 .

jgellar commented 9 years ago

Inspired by Fabian's update in 1c5039d1dba7b579affb9787953608a7c2e754f6, I decided to implement FPCR within pfr. This has been done here 01761c8baa43e363f16ac5c89929dbb81f6d3284.

It's working well, including all methods, though I admit I have not done much testing. If any of you guys want to play with it that would be great. The fit is matching that produced by fpcr almost identically. The differences are do to me using our L matrix system for the numerical integration. The coefficient functions are off by a factor of about ncol(X) - however, the scale matches that produced by lf.

My implementation allows for any mgcv basis to take the place of the b-spline basis that is applied to X before doing the PCA. "ps" with k=40 (and sp=0) is the default, as it is with fpcr.

Things that it can't do that fpcr does:

Can't accept 2-D predictors. I have given some thought to allowing all of the pfr functions to accept 2-D predictors at some point.
Doesn't do the cross-validation. My thought would be the easiest way to implement this would be to write a wrapper that calls pfr a bunch of times in order to optimize any smoothing parameters. It doesn't need to be part of the pfr function itself.

These improvements will be for future updates.

refunders / refund

scalar-on-function regression merging #20