tidyfun / tf

S3 classes and methods for tidy functional data
https://tidyfun.github.io/tf/
GNU Affero General Public License v3.0
6 stars 2 forks source link

make tfb_fpc work with (small amounts of) missingness #81

Closed fabian-s closed 4 months ago

fabian-s commented 4 months ago
pve <- .95
y <- tf_rgp(50, arg = 51L)
y_pc <- tfb_fpc(y, pve = pve)
y_mis <- y |> tf_sparsify(dropout = .05)
y_pc_sparse <- tfb_fpc(y_mis, pve = pve)
## Using softImpute SVD on 5.1% missing data
## Warning message:
## High <pve> with many missings likely to yield bad FPC estimates. 
y_pc_sparse_impute <- y_mis |>
  tf_interpolate(arg = tf_arg(y), evaluator = tf_approx_fill_extend) |>
  tfb_fpc(pve = pve)
y_pc_rebase <- tf_rebase(y_mis, y_pc)
Warning messages:
## 1: In tf_rebase.tfd.tfb_fpc(y_mis, y_pc) : 
## 2: 6 evaluations were NA, returning irregular tfd. 

layout(t(1:4))
plot(y[1:10], main = "full data")
lines(y_pc[1:10], col = 2, lty = 2)
plot(y[1:10], main = "5% missing")
lines(y_pc_sparse[1:10], col = 2, lty = 2)
plot(y[1:10], main = "from interpolated\n missings")
lines(y_pc_sparse_impute[1:10], col = 2, lty = 2)
plot(y[1:10], main = "scores from \nsparse data")
lines(y_pc_rebase[1:10], col = 2, lty = 2)

image

see docs for fpc_wsvd for details. seems useful for applications, probably only works well in cases where simply replacing missings with interpolated values before FPCA would work about as well. but at least this means one can now simply do cca_pc <- tfb_fpc(tidyfun::dti_df$cca), e.g., without worrying about that .5% of missing data...

@jeff-goldsmith @m-muecke @sebffischer if you have time, could you read the docs / try this out on some data and LMK what's missing/broken?

somewhat related to #10 as well - can now do tfb_fpc for irregular data, in principle