refunders / refund

Regression with functional data
39 stars 23 forks source link

variable selection in fosr #78

Closed iclocher closed 7 years ago

iclocher commented 7 years ago

fosr.vs.txt frtmatNoNA.txt flwmatNoNA.txt @yakuan-chen

Hello,

I am trying to implement the fosr.vs() function. My functional data consists in a matrix that has 7 columns (the longitudinal observations for each subject), and 55 rows (the number of subjects). Additionally, I have 8 scalar variables measured for each subject. I ran the example found at the end of the documentation under “?fosr.vs”, and it works. So I replicated it for my data (i.e., organized the data the same way and made sure that all the objects had the same format (list, matrix, data frame). It resulted in a data frame called “datai” that has 55 rows (number of replicates), which first 8 columns are the 8 independent variables that I want to test for my functional data, and which 9th column is “Yilo”, the matrix of the functional data (nrow=55, ncol=7). I try the function: fit = fosr.vs(Yilo~., data=datai, nbasis=4, method="ls") But I get the following error message: "Error in smooth.construct.tp.smooth.spec(object, dk$data, dk$knots) : A term has fewer unique covariate combinations than specified maximum degrees of freedom".

I read somewhere that fosr.vs uses “smooth.construct.tp.smooth.spec(object, dk$data, dk$knots)” internally, but I am not sure what I should add to affect positively this internal command. It seems to me that I already got this error message when using too few or too many knots in pfr(), but that when I used 4 to 7 knots is was working.

I should also mention that the fosr() function works on this data, with one or more independent variables. I was just thinking that the fosr.vd() would give me the answer in one step rather than me running the fosr() manually for each independent variable with only one independent variable, then two, then three, etc., and comparing the AICs.

I attached my code and associated text files. Please let me know your thoughts! Thanks for your time!

fabian-s commented 7 years ago

The error is due to the very low number of gridpoints of your functional response. The error actually occurs in a call to fpca.sc issued from fosr.vs for which your arguments don't affect the basis dimensions....

@yakuan-chen:

line w <- fpca.sc(a, var=T) causes this error because fpca.sc uses 10 basis functions to fit the smooth mean vector of functional responses by default, but the functional covariate here has only 7 gridpoints. Please fix this, either by

  1. allowing users to hand over their own args for the fpca.sc call or
  2. adapting the default args for fpca.sc for low-resolution functional responses or
  3. simply not using a smoothed covariance and doing a simple PCA for low-res Y, the former might be overkill anyway.
iclocher commented 7 years ago

Thank you @fabian-s !

jeff-goldsmith commented 7 years ago

@iclocher: as @fabian-s said, the issue had to do with the call to fpca.sc from fosr.vs. this step is used to estimate the residual covariance matrix, and requires quite a few points per curve to be effective. https://github.com/refunders/refund/commit/7e3cc69f89cb6e61bc9048587bc3e2a4d216521c addresses this by using the empirical covariance for low-resolution responses (an even more basic approach that option 3 above).

this change appears in the refundDevel, which can be installed using devtools::install_github("refunders/refund", ref="devel").

one other tip: in your attached file, i suggest using method = "grMCP" rather than method = "ls" to fit the model. i don't think you have a large enough sample size to estimate all the spline coefficients using least squares, but group MCP will be able to handle this problem.

iclocher commented 7 years ago

Indeed, that worked. Thank you for your fast great advice!