tidyverts / feasts

Feature Extraction And Statistics for Time Series
https://feasts.tidyverts.org/
295 stars 23 forks source link

Different results from tsfeatures #72

Closed robjhyndman closed 5 years ago

robjhyndman commented 5 years ago

I think these should all give the same results.

library(tsfeatures)
#> Registered S3 method overwritten by 'xts':
#>   method     from
#>   as.zoo.xts zoo
#> Registered S3 method overwritten by 'quantmod':
#>   method            from
#>   as.zoo.data.frame zoo
#> Registered S3 methods overwritten by 'forecast':
#>   method             from    
#>   fitted.fracdiff    fracdiff
#>   residuals.fracdiff fracdiff
library(feasts)
#> Loading required package: fabletools
#> 
#> Attaching package: 'feasts'
#> The following objects are masked from 'package:tsfeatures':
#> 
#>     unitroot_kpss, unitroot_pp
set.seed(20190910)
x <- ts(rnorm(100), frequency = 24)

tsfeatures(x,"stl_features", s.window='periodic', robust=TRUE)
#> # A tibble: 1 x 11
#>   nperiods seasonal_period  trend   spike linearity curvature  e_acf1
#>      <dbl>           <dbl>  <dbl>   <dbl>     <dbl>     <dbl>   <dbl>
#> 1        1              24 0.0135 3.45e-4    0.0311     -1.16 -0.0425
#> # … with 4 more variables: e_acf10 <dbl>, seasonal_strength <dbl>,
#> #   peak <dbl>, trough <dbl>
as_tsibble(x) %>% 
  features(value, list(~ feat_stl(., .period=24, s.window='periodic', robust=TRUE)))
#> # A tibble: 1 x 7
#>   trend_strength seasonal_streng… spikiness linearity curvature
#>            <dbl>            <dbl>     <dbl>     <dbl>     <dbl>
#> 1         0.0135            0.161  0.000425    0.0328     -1.23
#> # … with 2 more variables: seasonal_peak_24 <dbl>,
#> #   seasonal_trough_24 <dbl>

tsfeatures(x, "lumpiness")
#> # A tibble: 1 x 1
#>   lumpiness
#>       <dbl>
#> 1     0.139
as_tsibble(x) %>% 
  features(value, list(~ var_tiled_var(., .period=24)))
#> # A tibble: 1 x 1
#>   var_tiled_var
#>           <dbl>
#> 1         0.112

tsfeatures(x, c("max_level_shift","max_var_shift"))
#> # A tibble: 1 x 4
#>   max_level_shift time_level_shift max_var_shift time_var_shift
#>             <dbl>            <dbl>         <dbl>          <dbl>
#> 1           0.358               25         0.962             19
as_tsibble(x) %>%
  features(value, list(
    ~ shift_level_max(., .period = 24, .size = 24),
    ~ shift_var_max(., .period = 24, .size = 24)
  ))
#> # A tibble: 1 x 4
#>   shift_level_max shift_level_index shift_var_max shift_var_index
#>             <dbl>             <dbl>         <dbl>           <dbl>
#> 1           0.377                48          1.07              42

Created on 2019-09-10 by the reprex package (v0.3.0)

mitchelloharawild commented 5 years ago

By default tsfeatures() will scale the time series, which features() does not do.

library(tsfeatures)
library(feasts)
#> Loading required package: fabletools
#> 
#> Attaching package: 'feasts'
#> The following objects are masked from 'package:tsfeatures':
#> 
#>     unitroot_kpss, unitroot_pp
set.seed(20190910)
x <- ts(rnorm(100), frequency = 24)

tsfeatures(x,"stl_features", s.window='periodic', robust=TRUE, scale = FALSE)
#> # A tibble: 1 x 11
#>   nperiods seasonal_period  trend   spike linearity curvature  e_acf1
#>      <dbl>           <dbl>  <dbl>   <dbl>     <dbl>     <dbl>   <dbl>
#> 1        1              24 0.0135 4.25e-4    0.0328     -1.23 -0.0425
#> # … with 4 more variables: e_acf10 <dbl>, seasonal_strength <dbl>,
#> #   peak <dbl>, trough <dbl>
as_tsibble(x) %>% 
  features(value, list(~ feat_stl(., .period=24, s.window='periodic', robust=TRUE)))
#> # A tibble: 1 x 7
#>   trend_strength seasonal_streng… spikiness linearity curvature
#>            <dbl>            <dbl>     <dbl>     <dbl>     <dbl>
#> 1         0.0135            0.161  0.000425    0.0328     -1.23
#> # … with 2 more variables: seasonal_peak_24 <dbl>,
#> #   seasonal_trough_24 <dbl>

Created on 2019-09-10 by the reprex package (v0.3.0)

Session info ``` r devtools::session_info() #> ─ Session info ────────────────────────────────────────────────────────── #> setting value #> version R version 3.5.3 (2019-03-11) #> os Ubuntu 18.04.2 LTS #> system x86_64, linux-gnu #> ui X11 #> language (EN) #> collate en_AU.UTF-8 #> ctype en_AU.UTF-8 #> tz Australia/Melbourne #> date 2019-09-10 #> #> ─ Packages ────────────────────────────────────────────────────────────── #> package * version date lib #> anytime 0.3.6 2019-08-29 [1] #> assertthat 0.2.1 2019-03-21 [1] #> backports 1.1.4 2019-04-10 [1] #> callr 3.3.1 2019-07-18 [1] #> cli 1.1.0 2019-03-19 [1] #> colorspace 1.4-1 2019-03-18 [1] #> crayon 1.3.4 2017-09-16 [1] #> curl 4.0 2019-07-22 [1] #> desc 1.2.0 2018-05-01 [1] #> devtools 2.1.0 2019-07-06 [1] #> digest 0.6.20 2019-07-04 [1] #> dplyr 0.8.3 2019-07-04 [1] #> evaluate 0.14 2019-05-28 [1] #> fablelite 0.0.0.9100 2019-07-25 [1] #> fabletools * 0.1.0.9000 2019-09-10 [1] #> fansi 0.4.0 2018-10-05 [1] #> feasts * 0.1.0 2019-08-27 [1] #> forecast 8.8 2019-08-21 [1] #> fracdiff 1.4-2 2012-12-02 [1] #> fs 1.3.1 2019-05-06 [1] #> generics 0.0.2 2018-11-29 [1] #> ggplot2 3.2.0 2019-06-16 [1] #> glue 1.3.1 2019-03-12 [1] #> gtable 0.3.0 2019-03-25 [1] #> highr 0.8 2019-03-20 [1] #> htmltools 0.3.6 2017-04-28 [1] #> knitr 1.23 2019-05-18 [1] #> lattice 0.20-38 2018-11-04 [1] #> lazyeval 0.2.2 2019-03-15 [1] #> lifecycle 0.1.0 2019-08-01 [1] #> lmtest 0.9-37 2019-04-30 [1] #> lubridate 1.7.4 2018-04-11 [1] #> magrittr 1.5 2014-11-22 [1] #> memoise 1.1.0 2017-04-21 [1] #> munsell 0.5.0 2018-06-12 [1] #> nlme 3.1-137 2018-04-07 [2] #> nnet 7.3-12 2016-02-02 [2] #> pillar 1.4.2.9001 2019-08-27 [1] #> pkgbuild 1.0.4 2019-08-05 [1] #> pkgconfig 2.0.2 2018-08-16 [1] #> pkgload 1.0.2 2018-10-29 [1] #> prettyunits 1.0.2 2015-07-13 [1] #> processx 3.4.1 2019-07-18 [1] #> ps 1.3.0 2018-12-21 [1] #> purrr 0.3.2 2019-03-15 [1] #> quadprog 1.5-7 2019-05-06 [1] #> quantmod 0.4-15 2019-06-17 [1] #> R6 2.4.0 2019-02-14 [1] #> Rcpp 1.0.2 2019-07-25 [1] #> remotes 2.1.0 2019-06-24 [1] #> rlang 0.4.0.9002 2019-09-09 [1] #> rmarkdown 1.14 2019-07-12 [1] #> rprojroot 1.3-2 2018-01-03 [1] #> scales 1.0.0 2018-08-09 [1] #> sessioninfo 1.1.1 2018-11-05 [1] #> stringi 1.4.3 2019-03-12 [1] #> stringr 1.4.0 2019-02-10 [1] #> testthat 2.2.1 2019-07-25 [1] #> tibble 2.1.3 2019-06-06 [1] #> tidyr 0.8.3 2019-03-01 [1] #> tidyselect 0.2.5 2018-10-11 [1] #> timeDate 3043.102 2018-02-21 [1] #> tseries 0.10-47 2019-06-05 [1] #> tsfeatures * 1.0.1 2019-04-16 [1] #> tsibble 0.8.3.9000 2019-09-09 [1] #> TTR 0.23-4 2018-09-20 [1] #> urca 1.3-0 2016-09-06 [1] #> usethis 1.5.1 2019-07-04 [1] #> utf8 1.1.4 2018-05-24 [1] #> vctrs 0.2.0.9002 2019-09-09 [1] #> withr 2.1.2 2018-03-15 [1] #> xfun 0.8 2019-06-25 [1] #> xts 0.11-2 2018-11-05 [1] #> yaml 2.2.0 2018-07-25 [1] #> zeallot 0.1.0 2018-01-28 [1] #> zoo 1.8-6 2019-05-28 [1] #> source #> CRAN (R 3.5.3) #> CRAN (R 3.5.1) #> CRAN (R 3.5.1) #> CRAN (R 3.5.3) #> CRAN (R 3.5.1) #> CRAN (R 3.5.1) #> CRAN (R 3.5.1) #> CRAN (R 3.5.3) #> CRAN (R 3.5.1) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> local #> local #> CRAN (R 3.5.1) #> local #> local #> CRAN (R 3.5.1) #> CRAN (R 3.5.3) #> CRAN (R 3.5.1) #> CRAN (R 3.5.3) #> CRAN (R 3.5.1) #> CRAN (R 3.5.1) #> CRAN (R 3.5.1) #> CRAN (R 3.5.1) #> CRAN (R 3.5.3) #> CRAN (R 3.5.1) #> CRAN (R 3.5.1) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.1) #> CRAN (R 3.5.1) #> CRAN (R 3.5.1) #> CRAN (R 3.5.1) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> Github (r-lib/pillar@82370d7) #> CRAN (R 3.5.3) #> CRAN (R 3.5.1) #> CRAN (R 3.5.1) #> CRAN (R 3.5.1) #> CRAN (R 3.5.3) #> CRAN (R 3.5.1) #> CRAN (R 3.5.1) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.1) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> Github (r-lib/rlang@c5082e1) #> CRAN (R 3.5.3) #> CRAN (R 3.5.1) #> CRAN (R 3.5.1) #> CRAN (R 3.5.1) #> CRAN (R 3.5.1) #> CRAN (R 3.5.1) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.1) #> CRAN (R 3.5.1) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> Github (tidyverts/tsibble@b7b9339) #> CRAN (R 3.5.1) #> CRAN (R 3.5.3) #> CRAN (R 3.5.3) #> CRAN (R 3.5.1) #> Github (r-lib/vctrs@31c35cd) #> CRAN (R 3.5.1) #> CRAN (R 3.5.3) #> CRAN (R 3.5.1) #> CRAN (R 3.5.1) #> CRAN (R 3.5.1) #> CRAN (R 3.5.3) #> #> [1] /home/mitchell/R/x86_64-pc-linux-gnu-library/3.5 #> [2] /usr/local/lib/R/library ```

As for lumpiness() and var_tiled_var(), the latter is including the partial tile window at the end of the series. So if a series is length 100 and the window size is 24, it will be using 5 windows (one with size 4) unlike lumpiness() which ignores the end of the series. Probably best to ignore partial windows here?

As for max_*_shift() and shift_*_max() only the index/time varies after setting scale = FALSE. This is because tsfeatures reports time as the window number that maximises the shift. Effectively I think tsfeatures is outputting the index of the window's left, whereas feasts reports the index of the right-most element of the window. If a right-aligned sliding window is most appropriate here, I think the output in feasts is most appropriate.

robjhyndman commented 5 years ago
  1. Can we add a scale argument to feat_stl?
  2. Yes, best to ignore partial windows.
  3. OK
robjhyndman commented 5 years ago

Actually if we add a scale argument anywhere, it should probably be to features(). Would you rather that, or make the user create the scaled data directly like this?

df <- df %>%
  group_by(key) %>%
  mutate(z = scale(value)) %>%
  ungroup()
mitchelloharawild commented 5 years ago

I'd prefer scale to be user controllable, perhaps even:

df %>%
  features(scale(value), ...)
robjhyndman commented 5 years ago

features(scale(value), ..) would be great.

mitchelloharawild commented 5 years ago

Now possible with fabletools.