robjhyndman / forecast

Forecasting Functions for Time Series and Linear Models
http://pkg.robjhyndman.com/forecast
1.11k stars 342 forks source link

datetime incorrect when using forecast() for msts() #750

Closed englianhu closed 5 years ago

englianhu commented 5 years ago

By refer to https://dzone.com/articles/seasonal-periods, I try to model a seasonal model.

Dataset:

> smp %>% head
                       open    high     low    close
2015-01-05 00:01:00 120.504 120.542 120.520 120.5550
2015-01-05 00:02:00 120.558 120.570 120.566 120.5740
2015-01-05 00:03:00 120.596 120.588 120.588 120.5900
2015-01-05 00:04:00 120.592 120.606 120.566 120.6035
2015-01-05 00:05:00 120.606 120.594 120.506 120.5870
2015-01-05 00:06:00 120.585 120.576 120.506 120.5510
> smp %>% tail
                        open    high     low    close
2015-01-12 23:54:00 118.3560 118.354 118.344 118.3505
2015-01-12 23:55:00 118.3500 118.353 118.352 118.3520
2015-01-12 23:56:00 118.3510 118.358 118.352 118.3545
2015-01-12 23:57:00 118.3555 118.354 118.355 118.3525
2015-01-12 23:58:00 118.3520 118.365 118.356 118.3520
2015-01-12 23:59:00 118.3510 118.356 118.356 118.3550

Function:

> multi_seasons <- function(mbase, seasonal_periods = c(1440, 7200), auto_arima = FALSE, 
                          start = decimal_date(as_datetime('2016-01-05 00:00:00'))) {
  require('forecast')
  mbase <- msts(tk_ts(mbase), seasonal.periods = seasonal_periods, start = start)

  if (auto_arima == TRUE) {
    fit <- auto.arima(mbase, D = 1)
  } else {
    fit <- tbats(mbase)
  }
  forecast(fit, ahead = 1)
  }

Modelling:

> mts <- multi_seasons(smp)
> mts
           Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
2020.81037       118.3570 118.3193 118.3948 118.2993 118.4148
2020.81051       118.3590 118.3048 118.4132 118.2762 118.4419
2020.81065       118.3610 118.2943 118.4277 118.2591 118.4631
2020.81079       118.3630 118.2858 118.4402 118.2450 118.4811
2020.81093       118.3650 118.2786 118.4515 118.2329 118.4973
2020.81107       118.3670 118.2722 118.4618 118.2221 118.5121
2020.81121       118.3690 118.2666 118.4715 118.2124 118.5258
2020.81135       118.3710 118.2614 118.4806 118.2035 118.5387
2020.81148       118.3730 118.2567 118.4893 118.1952 118.5509
2020.81162       118.3749 118.2524 118.4976 118.1876 118.5626
2020.81176       118.3769 118.2483 118.5056 118.1803 118.5738
2020.81190       118.3789 118.2446 118.5133 118.1735 118.5846
2020.81204       118.3808 118.2410 118.5208 118.1671 118.5950
2020.81218       118.3828 118.2377 118.5281 118.1609 118.6051
2020.81232       118.3848 118.2345 118.5352 118.1551 118.6149
2020.81246       118.3867 118.2315 118.5421 118.1495 118.6244
2020.81260       118.3886 118.2287 118.5488 118.1441 118.6337
2020.81273       118.3906 118.2259 118.5554 118.1389 118.6428
2020.81287       118.3925 118.2233 118.5619 118.1339 118.6516
2020.81301       118.3944 118.2208 118.5682 118.1291 118.6603
2020.81315       118.3963 118.2184 118.5744 118.1244 118.6688
2020.81329       118.3982 118.2161 118.5805 118.1199 118.6771
2020.81343       118.4000 118.2139 118.5865 118.1155 118.6853
2020.81357       118.4019 118.2117 118.5923 118.1112 118.6933
2020.81371       118.4037 118.2096 118.5981 118.1070 118.7012
> mts %>% timetk::tk_index %>% head
[1] 2016.011 2016.011 2016.011 2016.011 2016.011 2016.012

I noticed that the datetime is not accurate.

> mts %>% tk_tbl
# A tibble: 14,400 x 6
   index `Point Forecast` `Lo 80` `Hi 80` `Lo 95` `Hi 95`
   <dbl>            <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
 1 2021.             118.    118.    118.    118.    118.
 2 2021.             118.    118.    118.    118.    118.
 3 2021.             118.    118.    118.    118.    118.
 4 2021.             118.    118.    118.    118.    118.
 5 2021.             118.    118.    118.    118.    118.
 6 2021.             118.    118.    118.    118.    119.
 7 2021.             118.    118.    118.    118.    119.
 8 2021.             118.    118.    118.    118.    119.
 9 2021.             118.    118.    118.    118.    119.
10 2021.             118.    118.    118.    118.    119.
# ... with 14,390 more rows
> mts %>% tk_tbl %>% mutate(index = as_datetime(index, origin = '2015-01-01'))
# A tibble: 14,400 x 6
   index               `Point Forecast` `Lo 80` `Hi 80` `Lo 95` `Hi 95`
   <dttm>                         <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
 1 2015-01-01 00:33:40             118.    118.    118.    118.    118.
 2 2015-01-01 00:33:40             118.    118.    118.    118.    118.
 3 2015-01-01 00:33:40             118.    118.    118.    118.    118.
 4 2015-01-01 00:33:40             118.    118.    118.    118.    118.
 5 2015-01-01 00:33:40             118.    118.    118.    118.    118.
 6 2015-01-01 00:33:40             118.    118.    118.    118.    119.
 7 2015-01-01 00:33:40             118.    118.    118.    118.    119.
 8 2015-01-01 00:33:40             118.    118.    118.    118.    119.
 9 2015-01-01 00:33:40             118.    118.    118.    118.    119.
10 2015-01-01 00:33:40             118.    118.    118.    118.    119.
# ... with 14,390 more rows
> as_datetime(2020.81037, origin = '2015-01-01')
[1] "2015-01-01 00:33:40 UTC"
> as_datetime(2020.81051, origin = '2015-01-01')
[1] "2015-01-01 00:33:40 UTC"
robjhyndman commented 5 years ago

Can you please share the data so we can easily reproduce the problem.

englianhu commented 5 years ago

The sample dataset above is smp.zip where the completed dataset is https://github.com/scibrokes/real-time-fxcm/blob/master/data/USDJPY/data_tm1.rds.

robjhyndman commented 5 years ago

smp has four columns. tbats() requires a univariate time series. It is turning them into a single time series, and then producing forecasts.

Use multi_seasons(smp[,1]).

robjhyndman commented 5 years ago

@mitchelloharawild . Perhaps add a check in tbats() to issue an error in this case.