refunders / refund

Regression with functional data
39 stars 23 forks source link

Scaling of eigenfunctions in fpcaZZZZ functions #65

Open jeff-goldsmith opened 8 years ago

jeff-goldsmith commented 8 years ago

It would be good to address the non-uniform scaling of eigen functions across functions implementing methods for FPCA. Right now, fpca.sc() scales functions to integrate to 1, while other functions treat eigenfunctions as eigenvectors (cross product scales to 1). A quick look indicates some work is needed for fpca.face, fpca2s and fpca.ssvd.

fabian-s commented 8 years ago

Phil wrote:

Is there a consensus on which scaling to use for the eigenfunctions (vector inner product = 1 vs. integral = 1)? And is this something we can harmonize for the upcoming CRAN submission?

I'd support moving to consistently having the fpcaXXX functions return evaluations of eigenfunctions, but then these would also have to receive and return the input functions' argvals and domain (the domain could probably be derived from the range of argvals, but that might be problematic for irregular data..?)

IMO the more work intensive but ultimately cleaner solution would be defining a class for (both regular and irregular) functional data that holds argvals and domain information in addition to the function values and changing the fpca functions so they only accept input data from that class. This is similar to what packages fda.usc and fda do with their fdata and fd classes, respectively. A rough example of how that might look for irregular data is in my last comment here

At least for fpca.ssvd it will also require some thought and possibly a partial rewrite of the code itself to move from vector orthonormality to function orthonormality, I think. (at least for the case of gridded data on non-equidistant grids, for irregular/sparse functions I'd have no idea where to even begin...)
@lxiao16, how hard would this be for fpca.face?

We also need to be consistent in terms of the default for argvals if it's not specified -- fpca.sc uses 1:D, where D is the gridlength, while fpca2s uses a sequence from 0 to 1 of length D (which is not actually used for scaling the eigenvectors/-functions, so it doesn't really matter ATM).

I can try to fix fpca.ssvd and fpca2s for the release if the consensus is to move towards eigenfunctions....

jeff-goldsmith commented 8 years ago

I'd support moving to consistently having the fpcaXXX functions return evaluations of eigenfunctions, but then these would also have to receive and return the input functions' argvals and domain (the domain could probably be derived from the range of argvals, but that might be problematic for irregular data..?)

Is there a reason this is more true for eigenfunctions than eigenvectors? At least for plotting, we should be returning the argvals no matter what, right?

IMO the more work intensive but ultimately cleaner solution would be defining a class for (both regular and irregular) functional data that holds argvals and domain information in addition to the function values and changing the fpca functions so they only accept input data from that class. This is similar to what packages fda.usc and fda do with their fdata and fd classes, respectively. A rough example of how that might look for irregular data is in my last comment here.

I wouldn't mind updating how we input data. However:

At least for fpca.ssvd it will also require some thought and possibly a partial rewrite of the code itself to move from vector orthonormality to function orthonormality, I think. (at least for the case of gridded data on non-equidistant grids, for irregular/sparse functions I'd have no idea where to even begin...)

My impression is that the covariance-smoothing-based methods won't work so well for irregular data anyway -- at least, I can't see a way that would work. For that sort of problem I've been using a "generative" approach, and hope to put some of that code in refund before too long.

fabian-s commented 8 years ago

As you can see above I've tried to tackle this for ssvd and 2s (and changed the default argvals for sc as well as added them to the return value). For data on non-equidistant grids, they return eigenVECTORS, with a warning. Irregular inputs yield an error.

In 2s I'm now simply estimating the scores via OLS based on the smooth eigenfunctions, and for both ssvd and 2s I take the variance of the scores as the estimated eigenvalues, because I failed at figuring out the right scaling to get something similar to sc for those for the general case.... @lxiao16 : any thoughts on how I could have done better?

jeff-goldsmith commented 8 years ago

Adding a short comment, as much for myself as anyone. I notice that cffc7bb returns argvals; as in #64 I'm leaning toward calling this index everywhere (which is why I had added it as such in f5b7860). I agree that it's better to consistently use argvals throughout until I switch the whole package though.