make `tfd_irreg`operations more tolerant

fabian-s commented 6 years ago

e.g. stuff like dti$rcst - mean(dti$rcst) should work, or at least dti$rcst - mean(dti$rcst, na.rm = TRUE)

fabian-s commented 6 years ago

will need modifiying / replacing fun_op to avoid failure on arg-comparison -- for each obs, the op is only defined on the intersection of args... :(

jeff-goldsmith commented 5 years ago

Not sure it's related, but

dti_df$cca %>% tf_smooth

works, while

dti_df$cca %>% mean(na.rm = TRUE) %>% tf_smooth

doesn't ...

fabian-s commented 5 years ago

ouch, thx. fixed in https://github.com/fabian-s/tidyfun/commit/3e1b8a60cff94a101536310e133d0e4bb1b7789d

fabian-s commented 2 years ago

see #5, warn about too much irregularity but do it

fabian-s commented 2 years ago

"warn about too much irregularity":

yields a WARNING
enough curves have enough data points in common
no warn if >50% have >50% gridpoints in commion (average pointwise coverage)
warn if any grid points exist with <10 % coverage (minimal pointwise coverage)

fabian-s commented 9 months ago

on further thought:

"warn about too much irregularity and do it" will be messy to code and probably still be unreliable. it also does a lot of intransparent interpolating of values behind the scenes -- better to make users decide where and how they want to inter/extrapolate by having them explicitly convert irregular data to regular data on a common grid etc. I no longer think that stuff like dti_df$rcst - mean(dti_df$rcst, na.rm = TRUE) should "just work" - in order to make this work, way too much magic would need to happen behind the scenes. better to give a fairly clear error (in this case we get "tf_arg(x) and tf_arg(y) are not equal", which could be better but seems informative enough...).

the current implementation (branch 5-NAhandling @ c7e351cf) of e.g. mean(<tfd_irreg>) will do what's expected for somewhat irregular data, IMO:

only return a mean function value for args that are present in all functions:

> tf_rgp(3) |> tf_sparsify() |> mean()
tfd[1] on (0,1) based on 7 to 7 (mean: 7) evaluations each  # input data had 51 !
inter-/extrapolation by tf_approx_linear 
[1]: (0.04, 0.065);(0.26,-0.752);(0.34,-0.676); ...

return an empty "tfd_irreg" for completely irregular data without any grid points in common:

> tf_rgp(3) |> tf_jiggle() |> mean()
empty or missing input `data`; returning prototype of length 0
tfd[1] on (0,0) based on 0 to 0 (mean: 0) evaluations each
inter-/extrapolation by tf_approx_linear 
[1]: (NULL,NULL)

or take the mean of all available data at each arg with na.rm = TRUE:

 > tf_rgp(3) |> tf_sparsify() |> mean(na.rm = TRUE)
tfd[1] on (0,1) based on 44 to 44 (mean: 44) evaluations each
inter-/extrapolation by tf_approx_linear 
[1]: (0.00,0.38);(0.02,1.19);(0.04,0.54); .

@jeff-goldsmith have you come across other issues in this vein? I'm having a hard time coming up with test cases for this.

jeff-goldsmith commented 9 months ago

Not sure this is exactly the kind of test cases you have in mind, but here are some settings where we'd have varying degrees of overlap in args across functional observations:

accelerometers that record at the minute level when worn, and people put on / take off at different times
ambulatory blood pressure cuffs which take one observation every ~30 minutes after the start time; some subjects have overlapping data but most are offset from each other (some people start at 8:32, others 9:17, etc)
basically any dataset after registration would have effectively no overlap across subjects

One wrinkle on whether dti_df$rcst - mean(dti_df$rcst, na.rm = TRUE) should "just work" -- chf_df$activity - mean(chf_df$activity) works because this is tf_reg. Users might not immediately get why one works and one doesn't; we may also want to suggest a workflow for "center irregular functional data".

fabian-s commented 9 months ago

users might not immediately get why one works and one doesn't;

true -- we need to add more doc / warnings for this

we may also want to suggest a workflow for "center irregular functional data".

i now think operations like this might actually become easier once tf_rebase is all done -- Ops-methods can then cast a tfd_reg to tfd_irreg on the same args and then perform this kind of thing.

tidyfun / tf

make `tfd_irreg`operations more tolerant #10