Open fabian-s opened 5 years ago
~return object should contain tf-vectors for e.g. Yhat, efunctions?~ ignore
not sure about efunctions
being a matrix. that seems like a step backwards.
proposed change: turn into a tfd
-vector
proposed general design principle going forward: objects that represent functional data should always be tf
-objects
not sure why the return object has class fpca
and rfr_fpca
-- what class hierarchy are we going to have here that requires a superclass rfr_fpca
, i.e., what other subclasses than fpca
will it have?
Methods to implement for fpca
-objects:
Base:
[ ] print (!!)
[ ] predict
[ ] summary
[ ] plot
[ ] update (?)
[ ] simulate (?)
Tidyverse:
[ ] autoplot
[ ] tidy (not sure...?)
[ ] glance
[ ] augment: not too sure about what this should do -- will need to think about usecases first
.. what else?
most modelr
functions are wrappers for common S3 methods like predict
and residuals
so once those are implemented we can use modelr::add_predictions
, modelr::add_residuals
and quick model quality metrics like mse
, mae
.
not sure why the return object has class
fpca
andrfr_fpca
-- what class hierarchy are we going to have here that requires a superclassrfr_fpca
, i.e., what other subclasses thanfpca
will it have?
probably rfr_fpca
won't be necessary. class fpca
is necessary for compatibility with refund.shiny. Initially I was concerned that methods like fitted.fpca, predict.fpca might already be taken but that doesn't seem to be the case..
most
modelr
functions are wrappers for common S3 methods likepredict
andresiduals
so once those are implemented we can usemodelr::add_predictions
,modelr::add_residuals
and quick model quality metrics likemse
,mae
.
just want to make sure i'm clear on this -- we can write predict.fpca
functions, and then use modelr::add_predictions
?
eventually i think we'd like to have something like:
fpca_fit =
rfr_fpca(cca, data = dti)
dti %>%
add_predictions(fpca_fit, .pred = "cca_fitted") %>%
ggplot(aes(y = cca_fitted)) + geom_spaghetti()
is that what you have in mind?
most
modelr
functions are wrappers for common S3 methods likepredict
andresiduals
so once those are implemented we can usemodelr::add_predictions
,modelr::add_residuals
and quick model quality metrics likemse
,mae
.just want to make sure i'm clear on this -- we can write
predict.fpca
functions, and then usemodelr::add_predictions
?eventually i think we'd like to have something like:
fpca_fit = rfr_fpca(cca, data = dti) dti %>% add_predictions(fpca_fit, .pred = "cca_fitted") %>% ggplot(aes(y = cca_fitted)) + geom_spaghetti()
is that what you have in mind?
Yeah, exactly. I think once we have the list of base methods Fabian wrote down that should follow pretty easily, and we won't need to write those functions ourselves or include a dependency on the modelr package. I think it would be good to include unit tests, though, for whatever modelr functions we want people to use so that we get alerted if the package breaks due to modelr updates.
89503cbacf244463b2a875ecae1a2f240d00af47 and related commits fleshed out a predict function. some thoughts on where things stand:
newdata
argument expects a tf
vector, but we probably want that to be a data frame with a column that has the same name the column used in rfr_fpca
. (that's gonna be necessary for the modelr
stuff, i think)fitted
returns a tfd
vector, but predict
returns a tfb-fpc
vector. should those be consistent ...?predict
needed a way to estimate FPC scores for a new input vector, so i've added a estimate_fpc_scores
function. that seems like a first step in modularizing fpca functions -- should we use that everywhere ...?been a while since i thought about this, but wanted to pick up on one point: agree that efunctions
as a matrix is a step backwards, and functions should always be tf
objects.
could reformat output in extract_fpca()
so that this is a vector; alternatively, could make a data frame (eigen_df
?) that includes a column for the eigenvalue and a column for the eigenvalue.
should also make the mean a tf
object. don't know if we want to have this is the same dataframe -- maybe a term
column with values mean
, fpc1
, fpc2
etc, and then a missing value for the mean
's eigenvalue ...?
more broadly -- what are the ways we expect users to interact with FPCA output, and how should we facilitate that?
first needs good comments of existing code. @fabian-s @jeff-goldsmith
also needs informed decision of what we actually want to keep.
one good default for regular, one good default for irregular/sparse