More careful handling of NA in `tf`

tidyfun / tf

S3 classes and methods for tidy functional data

https://tidyfun.github.io/tf/

GNU Affero General Public License v3.0

5 stars 2 forks source link

More careful handling of NA in `tf` #5

Closed fabian-s closed 5 months ago

fabian-s commented 5 years ago

Things to think about:

can regular tfd's consist of all NAs (cf. zoom.tfd if you zoom between two arg-values)? -> currently not
should it be possible to specifically define NA-values for regions / args where measurements are known to be invalid/ not applicable? -> currently not possible, can be solved by using suitable evaluators that don't interpolate / yield NAs there.

also: needs explicit documentation.

fabian-s commented 4 years ago

> mean(data_irreg)
tfd[1] on (1,100) based on Inf to -Inf (mean: NaN) evaluations each
inter-/extrapolation by tf_approx_linear 
[1]: NA

this really ugly -- should always keep domain etc.

fabian-s commented 2 years ago

would boil down to allowing different domains in a tf vector -- don'T want that
need more documentation / tests

fabian-s commented 2 years ago

what actually should mean/sd etc do for irregular inputs?

> f <- tf_sparsify(tf_rgp(4))
> f
tfd[4] on (0,1) based on 24 to 29 (mean: 26) evaluations each
inter-/extrapolation by tf_approx_linear 
[1]: (0.00, 0.68);(0.02, 0.81);(0.04, 0.95); ...
[2]: (0.00, -2.4);(0.04, -2.6);(0.08, -2.8); ...
[3]: (0.00, 0.79);(0.02, 0.82);(0.06, 0.68); ...
[4]: (0.02,-0.84);(0.04,-0.76);(0.06,-0.70); ...
> mean(f)
tfd[1] on (0,1) based on 5 to 5 (mean: 5) evaluations each
inter-/extrapolation by tf_approx_linear 
[1]: (0.08,-0.41);(0.18,-0.39);(0.54, 0.13); ...

check how irregular the grids are & warn if pointwise operations are a bad idea
message about doing interpolation to common grid first

fabian-s commented 8 months ago

also this:

library(tf)
#> 
#> Attaching package: 'tf'
#> The following objects are masked from 'package:stats':
#> 
#>     sd, var

d = data.frame(time = 1, value = NA_real_, id = "1")

x = tfd(d, arg = "time", value = "value", id = "id")

x
#> tfd[0] on (NA,NA)
#> Error in attr(f, "arg")[[1]]: subscript out of bounds

fabian-s commented 6 months ago

the current implementation (branch 5-NAhandling @ c7e351cf) takes care of these, mostly by being more careful about when to return an "empty prototype" and what kind, see also the comment on #33

@jeff-goldsmith: similar request as in the other issue: have you come across other problems in this vein? if not, inclined to close this as done for now.

jeff-goldsmith commented 6 months ago

I think the three scenarios I mentioned in the other issue would apply here (irregular device weartimes -> some overlap; irregular sampling times in ambulatory BP -> not a lot of overlap; data after registration -> no overlap).

Is message to users effectively "we're not going to assume a particular model or approach if you have irregular data"? That's probably reasonable and fair, although it might be frustrating in some cases...

fabian-s commented 6 months ago

Is message to users effectively "we're not going to assume a particular model or approach if you have irregular data"? That's probably reasonable and fair, although it might be frustrating in some cases...

I think so, yes. let's see if we can come up with ideas on what could make it less frustrating without making too many assumptions about how to inter/extrapolate irregular data..?

fabian-s commented 6 months ago

these also seem somewhat suboptimal:

> x <- tf_rgp(2, arg = seq(0, 1, length.out = 11))
> (x*NA)[1]
tfd[1] on (0,1) based on 11 evaluations each
interpolation by tf_approx_linear 
1: NA
> c(x, NA)[3]
tfd[1] on (0,1) based on 11 evaluations each
interpolation by tf_approx_linear 
[1]: (0.0,NULL);(0.1,NULL);(0.2,NULL); ...
> str(c(NA, x))
List of 3
 $  : logi NA
 $ 1: num [1:11] 4.156 3.667 2.643 1.558 0.736 ...
 $ 2: num [1:11] -0.0453 -0.389 -0.6657 -0.91 -0.7647 ...

~~first two should return sth identical (?)~~ EDIT: solved via #77
~~third loses all tfd-attributes which seems bad...~~ EDIT: more general vctrs issue, see link

also related to / affected by #77

fabian-s commented 6 months ago

re #77: see vctrs::vec_detect_missing(), vec_any_missing()

fabian-s commented 6 months ago

NB: using vec_c instead of c gives the desired behavior for c(NA, x)etc ...

fabian-s commented 5 months ago

most of this seems fixed to me, will open issues for specific edge cases as they come up