robjhyndman / forecast

Forecasting Functions for Time Series and Linear Models
http://pkg.robjhyndman.com/forecast
1.12k stars 342 forks source link

as.data.frame.forecast breaks for n>9990 #527

Closed kevinykuo closed 7 years ago

kevinykuo commented 7 years ago

traceback() says something is wrong with the print() call.

forecast_bug <- function(n) {
  fit <- data.frame(value = rnorm(n), ind = 1:n) %>%
    (function(x) zoo(x$value, x$ind)) %>%
    auto.arima()
  fit %>%
    forecast(h = 10)
}
set.seed(42)
forecast_bug(9990)
# Point Forecast     Lo 80    Hi 80    Lo 95   Hi 95
# 9991               0 -1.297673 1.297673 -1.98462 1.98462
# 9992               0 -1.297673 1.297673 -1.98462 1.98462
# 9993               0 -1.297673 1.297673 -1.98462 1.98462
# 9994               0 -1.297673 1.297673 -1.98462 1.98462
# 9995               0 -1.297673 1.297673 -1.98462 1.98462
# 9996               0 -1.297673 1.297673 -1.98462 1.98462
# 9997               0 -1.297673 1.297673 -1.98462 1.98462
# 9998               0 -1.297673 1.297673 -1.98462 1.98462
# 9999               0 -1.297673 1.297673 -1.98462 1.98462
# 10000              0 -1.297673 1.297673 -1.98462 1.98462
forecast_bug(9991)
# Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  : 
#                       duplicate row.names: 1.e+04
kevinykuo commented 7 years ago

Seems to be sensitive to index scaling, i.e. if we write ind = 1:n / 1000 it works OK...

mitchelloharawild commented 7 years ago

Better MRE:

rnorm(9991) %>% ts %>% auto.arima %>% forecast(h=10)
mitchelloharawild commented 7 years ago

@robjhyndman The issue source is from the forecast time column formatting (forecast.R#142):

as.double(10001) %>% format(digits=0)

For numbers with more than 5 digits, it converts to scientific notation (which is truncated, and no longer unique). We can extend the allowable digits, or convert it to a count starting from 1... There might also be a way to force data.frame to accept duplicate row names (however this is probably hack-y and will cause other data.frame issues).

robjhyndman commented 7 years ago

Fixed in https://github.com/robjhyndman/forecast/commit/38bb756ca2b2fe3cec3acda2468ad62620307089