tidyverts / feasts

Feature Extraction And Statistics for Time Series
https://feasts.tidyverts.org/
291 stars 23 forks source link

CCF lag labelling for quarterly time series #112

Closed RoelVerbelen closed 4 years ago

RoelVerbelen commented 4 years ago

First off, thanks for creating this great set of R packages for tidy time series analysis in R.

To the best of my understanding, when using CCF() on quarterly time series the lag label is wrong. A lag of 1 index unit gets labelled as a lag of 4 quarters instead of 1 quarter. See the example below where I make the comparison with stats::ccf().

library(tsibbledata)
library(feasts)
#> Loading required package: fabletools
ccf(aus_production$Electricity, aus_production$Gas, lag.max = 3, plot = FALSE)
#> 
#> Autocorrelations of series 'X', by lag
#> 
#>    -3    -2    -1     0     1     2     3 
#> 0.944 0.945 0.962 0.981 0.960 0.943 0.940
aus_production %>% 
  CCF(Electricity, Gas, lag_max = 3)
#> # A tsibble: 7 x 2 [1Q]
#>     lag   ccf
#>   <lag> <dbl>
#> 1  -12Q 0.944
#> 2   -8Q 0.945
#> 3   -4Q 0.962
#> 4    0Q 0.981
#> 5    4Q 0.960
#> 6    8Q 0.943
#> 7   12Q 0.940
aus_production %>% 
  CCF(Electricity, Gas, lag_max = 3) %>% 
  autoplot()

Created on 2020-07-27 by the reprex package (v0.3.0)

I traced it back to this line in the R code, where the lag gets multiplied with the frequency() of the data. Not sure why that multiplication should happen.

mitchelloharawild commented 4 years ago

Thanks, this isn't intended and has now been fixed.

library(tsibbledata)
library(feasts)
#> Loading required package: fabletools
ccf(aus_production$Electricity, aus_production$Gas, lag.max = 3, plot = FALSE)
#> 
#> Autocorrelations of series 'X', by lag
#> 
#>    -3    -2    -1     0     1     2     3 
#> 0.944 0.945 0.962 0.981 0.960 0.943 0.940
aus_production %>% 
  CCF(Electricity, Gas, lag_max = 3)
#> # A tsibble: 7 x 2 [1Q]
#>     lag   ccf
#>   <lag> <dbl>
#> 1   -3Q 0.944
#> 2   -2Q 0.945
#> 3   -1Q 0.962
#> 4    0Q 0.981
#> 5    1Q 0.960
#> 6    2Q 0.943
#> 7    3Q 0.940
aus_production %>% 
  CCF(Electricity, Gas, lag_max = 3) %>% 
  autoplot()

Created on 2020-07-27 by the reprex package (v0.3.0)

This would likely have been introduced when we stopped passing a frequency to ts for *CF functions.

You can see that multiplying by frequency() is required when the inputs have frequency>1:

ccf(mdeaths, fdeaths, plot = FALSE)$lag
#> , , 1
#> 
#>              [,1]
#>  [1,] -1.25000000
#>  [2,] -1.16666667
#>  [3,] -1.08333333
#>  [4,] -1.00000000
#>  [5,] -0.91666667
#>  [6,] -0.83333333
#>  [7,] -0.75000000
#>  [8,] -0.66666667
#>  [9,] -0.58333333
#> [10,] -0.50000000
#> [11,] -0.41666667
#> [12,] -0.33333333
#> [13,] -0.25000000
#> [14,] -0.16666667
#> [15,] -0.08333333
#> [16,]  0.00000000
#> [17,]  0.08333333
#> [18,]  0.16666667
#> [19,]  0.25000000
#> [20,]  0.33333333
#> [21,]  0.41666667
#> [22,]  0.50000000
#> [23,]  0.58333333
#> [24,]  0.66666667
#> [25,]  0.75000000
#> [26,]  0.83333333
#> [27,]  0.91666667
#> [28,]  1.00000000
#> [29,]  1.08333333
#> [30,]  1.16666667
#> [31,]  1.25000000

Created on 2020-07-27 by the reprex package (v0.3.0)

RoelVerbelen commented 4 years ago

Thanks for the lightning fast response @mitchelloharawild !

Fuco1 commented 3 years ago

I had the same issue with monthly series (the lag was presented as 12 months instead of 1 month). Installing the version 0.1.5 fixed the problem. Thanks!