tidyverts / fable

Tidy time series forecasting
https://fable.tidyverts.org
GNU General Public License v3.0
559 stars 65 forks source link

ARIMA error with categorical predictor #181

Closed mpjashby closed 5 years ago

mpjashby commented 5 years ago

I am modelling some data that has values every 8 hours. I would like to capture multiple seasonality in models that cannot do so automatically by including a dummy variable for day of the week as well as extracting the 8-hour seasonal period from the data using the season() special.

This approach works for TSLM() but not for ARIMA(), which fails for models including a categorical variable. For example, in the ARIMA models below, including either Weekday (ordered factor) or FirstLast (character) as a predictor produces a null model.

library(fable)
#> Loading required package: fablelite
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date
library(tsibble)
#> 
#> Attaching package: 'tsibble'
#> The following objects are masked from 'package:lubridate':
#> 
#>     interval, new_interval
library(tidyverse)

test_models <- tsibbledata::vic_elec %>% 
    filter(Time < ymd_hm("2012-02-01 00:00")) %>% 
    as_tibble() %>% 
    mutate(Time = floor_date(Time, "8 hours")) %>% 
    group_by(Time) %>% 
    summarise(Demand = mean(Demand), Temperature = mean(Temperature), 
                        Holiday = first(Holiday)) %>% 
    mutate(
        Weekday = wday(Time, label = TRUE),
        FirstLast = case_when(
            mday(Time) == 1 ~ "first", 
            mday(Time) == days_in_month(Time) ~ "last",
            TRUE ~ "other"
        )
    ) %>% 
    as_tsibble(index = Time) %>% 
    model(
        tslm = TSLM(Demand ~ trend() + season() + Weekday + Temperature + Holiday),
        works = ARIMA(Demand ~ trend() + season() + Temperature + Holiday),
        doesnt = ARIMA(Demand ~ trend() + season() + Weekday + Temperature + Holiday),
        also_doesnt = ARIMA(Demand ~ trend() + season() + FirstLast + Temperature + Holiday),

    )
#> Warning in FUN(newX[, i], ...): NAs introduced by coercion

#> Warning in FUN(newX[, i], ...): NAs introduced by coercion

#> Warning in FUN(newX[, i], ...): NAs introduced by coercion

#> Warning in FUN(newX[, i], ...): NAs introduced by coercion
#> Warning: 1 error encountered for doesnt
#> [1] infinite or missing values in 'x'
#> Warning: 1 error encountered for also_doesnt
#> [1] infinite or missing values in 'x'

test_models
#> # A mable: 1 x 4
#>   tslm    works                                 doesnt       also_doesnt 
#>   <model> <model>                               <model>      <model>     
#> 1 <TSLM>  <LM w/ ARIMA(0,0,2)(0,0,1)[3] errors> <NULL model> <NULL model>

My usual apologies if I've overlooked something obvious!

Created on 2019-07-25 by the reprex package (v0.3.0)

mitchelloharawild commented 5 years ago

Fixed, thanks! Great clear reprex :+1:

mitchelloharawild commented 5 years ago
library(fable)
#> Loading required package: fablelite
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date
library(tsibble)
#> 
#> Attaching package: 'tsibble'
#> The following objects are masked from 'package:lubridate':
#> 
#>     interval, new_interval
library(tidyverse)

test_models <- tsibbledata::vic_elec %>% 
  filter(Time < ymd_hm("2012-02-01 00:00")) %>% 
  as_tibble() %>% 
  mutate(Time = floor_date(Time, "8 hours")) %>% 
  group_by(Time) %>% 
  summarise(Demand = mean(Demand), Temperature = mean(Temperature), 
            Holiday = first(Holiday)) %>% 
  mutate(
    Weekday = wday(Time, label = TRUE),
    FirstLast = case_when(
      mday(Time) == 1 ~ "first", 
      mday(Time) == days_in_month(Time) ~ "last",
      TRUE ~ "other"
    )
  ) %>% 
  as_tsibble(index = Time) %>% 
  model(
    tslm = TSLM(Demand ~ trend() + season() + Weekday + Temperature + Holiday),
    works = ARIMA(Demand ~ trend() + season() + Temperature + Holiday),
    doesnt = ARIMA(Demand ~ trend() + season() + Weekday + Temperature + Holiday),
    also_doesnt = ARIMA(Demand ~ trend() + season() + FirstLast + Temperature + Holiday),
  )

test_models
#> # A mable: 1 x 4
#>   tslm   works                 doesnt               also_doesnt            
#>   <mode> <model>               <model>              <model>                
#> 1 <TSLM> <LM w/ ARIMA(0,0,2)(… <LM w/ ARIMA(0,0,0)… <LM w/ ARIMA(0,0,2)(0,…

Created on 2019-07-25 by the reprex package (v0.3.0)