`fpca` renovation - Githubissues

fabian-s commented 5 years ago

first needs good comments of existing code. @fabian-s @jeff-goldsmith

also needs informed decision of what we actually want to keep.

one good default for regular, one good default for irregular/sparse

fabian-s commented 5 years ago

~return object should contain tf-vectors for e.g. Yhat, efunctions?~ ignore

fabian-s commented 4 years ago

not sure about efunctions being a matrix. that seems like a step backwards.

proposed change: turn into a tfd-vector

proposed general design principle going forward: objects that represent functional data should always be tf-objects

fabian-s commented 4 years ago

not sure why the return object has class fpca and rfr_fpca -- what class hierarchy are we going to have here that requires a superclass rfr_fpca, i.e., what other subclasses than fpca will it have?

fabian-s commented 4 years ago

Methods to implement for fpca-objects:

Base:

[ ] print (!!)
[ ] predict
[ ] summary
[ ] plot
[ ] update (?)
[ ] simulate (?)
[ ] residuals

Tidyverse:
[ ] autoplot
[ ] tidy (not sure...?)
[ ] glance
[ ] augment: not too sure about what this should do -- will need to think about usecases first

.. what else?

julia-wrobel commented 4 years ago

most modelr functions are wrappers for common S3 methods like predict and residuals so once those are implemented we can use modelr::add_predictions, modelr::add_residuals and quick model quality metrics like mse, mae.

julia-wrobel commented 4 years ago

not sure why the return object has class fpca and rfr_fpca -- what class hierarchy are we going to have here that requires a superclass rfr_fpca, i.e., what other subclasses than fpca will it have?

probably rfr_fpca won't be necessary. class fpca is necessary for compatibility with refund.shiny. Initially I was concerned that methods like fitted.fpca, predict.fpca might already be taken but that doesn't seem to be the case..

jeff-goldsmith commented 4 years ago

most modelr functions are wrappers for common S3 methods like predict and residuals so once those are implemented we can use modelr::add_predictions, modelr::add_residuals and quick model quality metrics like mse, mae.

just want to make sure i'm clear on this -- we can write predict.fpca functions, and then use modelr::add_predictions?

eventually i think we'd like to have something like:

fpca_fit = 
  rfr_fpca(cca, data = dti)

dti %>%
  add_predictions(fpca_fit, .pred = "cca_fitted") %>%
  ggplot(aes(y = cca_fitted)) + geom_spaghetti()

is that what you have in mind?

julia-wrobel commented 4 years ago

most modelr functions are wrappers for common S3 methods like predict and residuals so once those are implemented we can use modelr::add_predictions, modelr::add_residuals and quick model quality metrics like mse, mae.

just want to make sure i'm clear on this -- we can write predict.fpca functions, and then use modelr::add_predictions?

eventually i think we'd like to have something like:
fpca_fit = 
  rfr_fpca(cca, data = dti)

dti %>%
  add_predictions(fpca_fit, .pred = "cca_fitted") %>%
  ggplot(aes(y = cca_fitted)) + geom_spaghetti()
is that what you have in mind?

Yeah, exactly. I think once we have the list of base methods Fabian wrote down that should follow pretty easily, and we won't need to write those functions ourselves or include a dependency on the modelr package. I think it would be good to include unit tests, though, for whatever modelr functions we want people to use so that we get alerted if the package breaks due to modelr updates.

jeff-goldsmith commented 4 years ago

89503cbacf244463b2a875ecae1a2f240d00af47 and related commits fleshed out a predict function. some thoughts on where things stand:

the newdata argument expects a tf vector, but we probably want that to be a data frame with a column that has the same name the column used in rfr_fpca. (that's gonna be necessary for the modelr stuff, i think)
fitted returns a tfd vector, but predict returns a tfb-fpc vector. should those be consistent ...?
predict needed a way to estimate FPC scores for a new input vector, so i've added a estimate_fpc_scores function. that seems like a first step in modularizing fpca functions -- should we use that everywhere ...?

jeff-goldsmith commented 1 year ago

been a while since i thought about this, but wanted to pick up on one point: agree that efunctions as a matrix is a step backwards, and functions should always be tf objects.

could reformat output in extract_fpca() so that this is a vector; alternatively, could make a data frame (eigen_df?) that includes a column for the eigenvalue and a column for the eigenvalue.

should also make the mean a tf object. don't know if we want to have this is the same dataframe -- maybe a term column with values mean, fpc1, fpc2 etc, and then a missing value for the mean's eigenvalue ...?

more broadly -- what are the ways we expect users to interact with FPCA output, and how should we facilitate that?

tidyfun / refundr

`fpca` renovation #2

[ ] residuals