shabbychef / fromo

Fast Robust Moments in R with Rcpp
3 stars 1 forks source link

trying to get, the median with variable window sizes #34

Closed AndreMikulec closed 5 years ago

AndreMikulec commented 5 years ago

Hi,

I am trying to get the median with variable window sizes.

I want non-fixed variable window sizes of

c(3,3,3,2,2)

I try

fromo::t_running_apx_quantiles(c(1,2,4,8,16), 
  window = NULL, variable_win = TRUE, min_df = 2, p = 0.50, 
  time_deltas = c(3,3,3,2,2))

When I run I get this.

      [,1]
[1,]   NaN
[2,]   NaN
[3,]   NaN
[4,]   NaN
[5,] 5.134

I think that I am not understanding what variable_win, min_df, and time_deltas mean. Does 'time_deltas" mean the 'variable window sizes?' I am expecting to get.

      [,1]
[1,]   NaN
[2,]   NaN
[3,]   2   # quantile(c(1,2,4), p = 0.5) == 2
[4,]   6   # quantile(c(4,8), p = 0.5) == 6
[5,]   12  # quantile(c(8,16), p = 0.5) == 12

What do I need to do?

Thanks. Andre

shabbychef commented 5 years ago

The t_running_ functions compute, for the most part, over a constant time window, but can do so from different lookback times. So if the data are, say, sales for every day, you could compute the standard deviation of sales over the past (calendar) year, as of every month end, say. It cannot compute over different windows, with one exception: When variable_win is true, then it computes over the period between each lookback date. If the lookback dates are equal to the times associated with the data, the computation is generally empty. (I should also note that you are only going to get an approximate median from the moments functions, and the approximation will be not so great for very small sample sizes.)

Taking it back a little, if you wanted to sum over the past 3,3,3,2,2 elements you could do, say:

t_running_sd3(c(1,2,4,8,16),time=c(1,2,3,5,7),window=2.5)

which gives me

       [,1]   [,2] [,3]
[1,]    NaN  1.000    1
[2,] 0.7071  1.500    2
[3,] 1.5275  2.333    3
[4,] 2.8284  6.000    2
[5,] 5.6569 12.000    2

I think you could achieve something like what you thought by inverting the time_deltas (when time is not given, but time_deltas are given, we basically compute time=cumsum(time_deltas).), something like:

t_running_sd3(c(1,2,4,8,16),time=NULL,time_deltas=1/c(3,3,3,2,2),window=0.99)

which yields:

       [,1]   [,2] [,3]
[1,]    NaN  1.000    1
[2,] 0.7071  1.500    2
[3,] 1.5275  2.333    3
[4,] 3.0551  4.667    3
[5,] 5.6569 12.000    2
shabbychef commented 5 years ago

The overriding principle for the t_running functions is that a single observation in the data can be added and removed from the 'summed set' at most once. This is obviously true for a fixed index-based window. It is also true of the time-based windows. However, it could not be guaranteed for the case where the lookback window was an arbitrary function of the index.

AndreMikulec commented 5 years ago

O.K. Thanks.