tidyverts / fable

Tidy time series forecasting
https://fable.tidyverts.org
GNU General Public License v3.0
558 stars 65 forks source link

"I can't use NNETAR to forecast with missing values near the end of the series." #326

Closed VicenteYago closed 3 years ago

VicenteYago commented 3 years ago

Hi, im having the following error with fable::NNETAR() :

library(tidyverse)
df <- structure(list(day = structure(c(18669, 18670, 18671, 18672, 
18673, 18674, 18675, 18676, 18677, 18678, 18679, 18680, 18681, 
18682, 18683), class = "Date"), et0 = c(2.14246389897611, 3.11190679332797, 
2.32110501161407, 1.81745745040224, 1.57074577539245, 1.70213549178058, 
2.27715557354997, 1.93251964636276, 2.02079558594814, 1.67034979894605, 
1.62815330830132, 2.7200060919322, 1.91859668337379, 2.22514374138325, 
1.65308090579251)), row.names = c(NA, -15L), key = structure(list(
    .rows = structure(list(1:15), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), row.names = c(NA, -1L), class = c("tbl_df", 
"tbl", "data.frame")), index = structure("day", ordered = TRUE), index2 = "day", interval = structure(list(
    year = 0, quarter = 0, month = 0, week = 0, day = 1, hour = 0, 
    minute = 0, second = 0, millisecond = 0, microsecond = 0, 
    nanosecond = 0, unit = 0), .regular = TRUE, class = c("interval", 
"vctrs_rcrd", "vctrs_vctr")), class = c("tbl_ts", "tbl_df", "tbl", 
"data.frame"))
#print(df)
# A tsibble: 15 x 2 [1D]
   day          et0
   <date>     <dbl>
 1 2021-02-11  2.14
 2 2021-02-12  3.11
 3 2021-02-13  2.32
 4 2021-02-14  1.82
 5 2021-02-15  1.57
 6 2021-02-16  1.70
 7 2021-02-17  2.28
 8 2021-02-18  1.93
 9 2021-02-19  2.02
10 2021-02-20  1.67
11 2021-02-21  1.63
12 2021-02-22  2.72
13 2021-02-23  1.92
14 2021-02-24  2.23
15 2021-02-25  1.65

For example with the first 8 rows works:

df[1:8,] %>% fabletools::model(nnetar = fable::NNETAR(et0)) %>% fabletools::forecast(h = "1 day")
# A fable: 1 x 4 [1D]
# Key:     .model [1]
  .model day                 et0 .mean
  <chr>  <date>           <dist> <dbl>
1 nnetar 2021-02-19 sample[5000]  1.94
Warning message:
Series too short for seasonal lags 

But with the first 9 rows fails:

df[1:9,] %>% fabletools::model(nnetar = fable::NNETAR(et0)) %>% fabletools::forecast(h = "1 day")
Error: Problem with `mutate()` input `nnetar`.
x Problem with `mutate()` input `.sim`.
x I can't use NNETAR to forecast with missing values near the end of the series.
ℹ Input `.sim` is `sim_nnetar(.innov)`.
ℹ Input `nnetar` is `(function (object, ...) ...`.
Run `rlang::last_error()` to see where the error occurred.

NNETAR() fails with this inputs until at least 14 rows are feeded. The error has no sense for me since the timeseries has no missing values

I dont know if this error its normal due to the unsufficent input or its something more.

sessionInfo()

R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=es_ES.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=es_ES.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=es_ES.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] forcats_0.5.1       stringr_1.4.0       dplyr_1.0.4         purrr_0.3.4         readr_1.4.0        
 [6] tidyr_1.1.2         tibble_3.0.6        ggplot2_3.3.3       tidyverse_1.3.0     predictiveET0_0.1.0

loaded via a namespace (and not attached):
 [1] zoo_1.8-8               tidyselect_1.1.0        urca_1.3-0              haven_2.3.1            
 [5] tsibble_0.9.3           lattice_0.20-41         colorspace_2.0-0        vctrs_0.3.6            
 [9] generics_0.1.0          utf8_1.1.4              rlang_0.4.10            pillar_1.4.7           
[13] withr_2.4.1             glue_1.4.2              DBI_1.1.1               dbplyr_2.1.0           
[17] readxl_1.3.1            modelr_0.1.8            distributional_0.2.2    lifecycle_0.2.0        
[21] cellranger_1.1.0        munsell_0.5.0           anytime_0.3.9           gtable_0.3.0           
[25] progressr_0.7.0         rvest_0.3.6             curl_4.3                fansi_0.4.2            
[29] broom_0.7.4             Rcpp_1.0.6              weathermetrics_1.2.2    scales_1.1.1           
[33] backports_1.2.1         Evapotranspiration_1.15 fable_0.3.0             jsonlite_1.7.2         
[37] fs_1.5.0                farver_2.0.3            hms_1.0.0               digest_0.6.27          
[41] stringi_1.5.3           ET.PenmanMonteith_0.1.0 grid_4.0.3              cli_2.3.0              
[45] tools_4.0.3             magrittr_2.0.1          feasts_0.1.7            fabletools_0.3.0       
[49] crayon_1.4.1            pkgconfig_2.0.3         ellipsis_0.3.1          xml2_1.3.2             
[53] reprex_1.0.0            lubridate_1.7.9.2       assertthat_0.2.1        httr_1.4.2             
[57] rstudioapi_0.13         R6_2.5.0                nnet_7.3-15             nlme_3.1-152           
[61] compiler_4.0.3         

>packageVersion("fable")
[1] ‘0.3.0’

>packageVersion("fabletools")
[1] ‘0.3.0’
mitchelloharawild commented 3 years ago

Should work now, thanks for the reproducible bug report. The issue was related to using a short time series - instead of storing recent data from the full series length, the code used the model's response (which is a bit shorter due to lagged responses).