tidyverts / feasts

Feature Extraction And Statistics for Time Series
https://feasts.tidyverts.org/
291 stars 23 forks source link

STL with window="periodic" not correct #155

Open robjhyndman opened 1 year ago

robjhyndman commented 1 year ago

These two graphs should be the same. The second seems to leave a lot of the seasonality in the remainder series.

library(fpp3)
library(rdwd)

df <-
  readDWD(dataDWD(
    selectDWD(
      name = "Jena (Sternwarte)",
      res = "daily",
      var = "kl",
      per = "historical"
    ),
    read = FALSE
  ), varnames = TRUE) |> 
  tibble() %>%
  mutate(date = as.Date(MESS_DATUM)) %>%
  filter(date >= as.Date("2005-01-01")) %>%
  rename(temperature = TMK.Lufttemperatur) |> 
  as_tsibble(index = date) %>%
  select(date, temperature) 

df |> 
  model(STL(temperature ~ season(period = "year", window=999999), robust = TRUE)) |>
  components() |>
  autoplot()


df |> 
  model(STL(temperature ~ season(period = "year", window="periodic"), robust = TRUE)) |>
  components() |>
  autoplot()

Created on 2022-10-26 with reprex v2.0.2

mitchelloharawild commented 1 year ago

This is resulting from the non-integer seasonality... strange.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(rdwd)
dat <- readDWD(dataDWD(
  selectDWD(
    name = "Jena (Sternwarte)",
    res = "daily",
    var = "kl",
    per = "historical"
  ),
  read = FALSE
), varnames = TRUE) %>% 
  tibble() %>%
  mutate(date = as.Date(MESS_DATUM)) %>%
  filter(date >= as.Date("2005-01-01")) %>%
  rename(temperature = TMK.Lufttemperatur) |> 
  pull(temperature) %>% 
  ts(frequency = 365.25)
#> Warning: -> pull -> ts -> is.data.frame -> pull -> readDWD -> rename -> filter
#> -> mutate -> tibble -> tibble_quos -> eval_tidy -> readDWD -> dataDWD: In late
#> 2022, dir will default to locdir(). From then on, use dir='DWDdata' explicitely
#> to store in a project-specific folder.
#>  ->  pull -> ts -> is.data.frame -> pull -> readDWD -> rename -> filter -> mutate -> tibble -> tibble_quos -> eval_tidy -> readDWD -> dataDWD -> dirDWD: creating directory '/tmp/RtmpjWu3DN/reprex-111ac5dedcc05-awful-dog/DWDdata'
#>  ->  pull -> ts -> is.data.frame -> pull -> readDWD -> rename -> filter -> mutate -> tibble -> tibble_quos -> eval_tidy -> readDWD -> dataDWD -> newFilename: creating 1 file:
#> '/tmp/RtmpjWu3DN/reprex-111ac5dedcc05-awful-dog/DWDdata/daily_kl_historical_tageswerte_KL_02444_18240101_20211231_hist.zip'
#> Reading 1 file with readDWD.data() and fread=TRUE ...

plot(stl(dat, s.window = 999999, robust = TRUE))

plot(stl(dat, s.window = "periodic", robust = TRUE))


plot(forecast::mstl(dat, s.window = 999999, robust = TRUE))
#> Registered S3 method overwritten by 'quantmod':
#>   method            from
#>   as.zoo.data.frame zoo

plot(forecast::mstl(dat, s.window = "periodic", robust = TRUE))

Created on 2022-10-26 by the reprex package (v2.0.1)

mitchelloharawild commented 1 year ago

I've checked the subsequent call to Fortran and the only difference in the calls are the s.window and s.jump.

When periodic is set, s.window <- 10 * n + 1 and for this series n is 6209. s.jump = ceiling(s.window/10).

robjhyndman commented 1 year ago

That's weird. Perhaps we should round the period before calling stats::stl().

robjhyndman commented 1 year ago

Probably better to replace "periodic" with some very large odd number when calling stats::stl()