Open fabian-s opened 6 years ago
will need modifiying / replacing fun_op to avoid failure on arg-comparison -- for each obs, the op is only defined on the intersection of args... :(
Not sure it's related, but
dti_df$cca %>% tf_smooth
works, while
dti_df$cca %>% mean(na.rm = TRUE) %>% tf_smooth
doesn't ...
see #5, warn about too much irregularity but do it
"warn about too much irregularity":
on further thought:
"warn about too much irregularity and do it" will be messy to code and probably still be unreliable.
it also does a lot of intransparent interpolating of values behind the scenes -- better to make users decide where and how they want to inter/extrapolate by having them explicitly convert irregular data to regular data on a common grid etc. I no longer think that stuff like dti_df$rcst - mean(dti_df$rcst, na.rm = TRUE)
should "just work" - in order to make this work, way too much magic would need to happen behind the scenes. better to give a fairly clear error (in this case we get "tf_arg(x) and tf_arg(y) are not equal", which could be better but seems informative enough...).
the current implementation (branch 5-NAhandling @ c7e351cf) of e.g. mean(<tfd_irreg>)
will do what's expected for somewhat irregular data, IMO:
only return a mean function value for args that are present in all functions:
> tf_rgp(3) |> tf_sparsify() |> mean()
tfd[1] on (0,1) based on 7 to 7 (mean: 7) evaluations each # input data had 51 !
inter-/extrapolation by tf_approx_linear
[1]: (0.04, 0.065);(0.26,-0.752);(0.34,-0.676); ...
return an empty "tfd_irreg" for completely irregular data without any grid points in common:
> tf_rgp(3) |> tf_jiggle() |> mean()
empty or missing input `data`; returning prototype of length 0
tfd[1] on (0,0) based on 0 to 0 (mean: 0) evaluations each
inter-/extrapolation by tf_approx_linear
[1]: (NULL,NULL)
or take the mean of all available data at each arg with na.rm = TRUE
:
> tf_rgp(3) |> tf_sparsify() |> mean(na.rm = TRUE)
tfd[1] on (0,1) based on 44 to 44 (mean: 44) evaluations each
inter-/extrapolation by tf_approx_linear
[1]: (0.00,0.38);(0.02,1.19);(0.04,0.54); .
@jeff-goldsmith have you come across other issues in this vein? I'm having a hard time coming up with test cases for this.
Not sure this is exactly the kind of test cases you have in mind, but here are some settings where we'd have varying degrees of overlap in arg
s across functional observations:
One wrinkle on whether dti_df$rcst - mean(dti_df$rcst, na.rm = TRUE)
should "just work" -- chf_df$activity - mean(chf_df$activity)
works because this is tf_reg
. Users might not immediately get why one works and one doesn't; we may also want to suggest a workflow for "center irregular functional data".
users might not immediately get why one works and one doesn't;
true -- we need to add more doc / warnings for this
we may also want to suggest a workflow for "center irregular functional data".
i now think operations like this might actually become easier once tf_rebase
is all done -- Ops-methods can then cast a tfd_reg to tfd_irreg on the same args and then perform this kind of thing.
e.g. stuff like
dti$rcst - mean(dti$rcst)
should work, or at leastdti$rcst - mean(dti$rcst, na.rm = TRUE)