robjhyndman / forecast

Forecasting Functions for Time Series and Linear Models
http://pkg.robjhyndman.com/forecast
1.1k stars 340 forks source link

nnetar doesn't work correctly for seasonal time series #941

Closed ignacio-ucm closed 10 months ago

ignacio-ucm commented 1 year ago

Introduction

Hello, I'm working with this package for my Master thesis and i've noticed an issue.

Bug context

When fitting Feed-forward neural network for forecasting time series, you can use the nnetar function, which allows the use of lagged inputs. These lagged inputs are defined by the parameters p (non-seasonal lags) and P (seasonal lags).

Reminder of the nnetar structure: nnetar(y, p, P = 1, size, repeats = 20, xreg = NULL, lambda = NULL, model = NULL, subset = NULL, scale.inputs = TRUE, x = y)

Bug description

Both parameters p and P are modified in the following part of the nnetar function script:

if (!useoldmodel) {
    if (m == 1) {
      if (missing(p)) {
        p <- max(length(ar(na.interp(xx))$ar), 1)
      }
      if (p == 0) {
        warning("Cannot set p = 0 for non-seasonal data; using default calculation for p")
        p <- max(length(ar(na.interp(xx))$ar), 1)
      }
      if (p >= n) {
        warning("Reducing number of lagged inputs due to short series")
        p <- n - 1
      }
      lags <- seq_len(p)
      if (P > 1) {
        warning("Non-seasonal data, ignoring seasonal lags")
      }
      P <- 0
    }
}

This modifications make sense as they occur when m=1, where m is the frequency of the time series (as we can see in the code below)

 n <- length(xx)
 xx <- as.ts(xx)
 m <- max(round(frequency(xx)), 1L)

However, the problem is that if scale.inputs=TRUE, the object xx will always have a frequency of 1.

In other words, if you want to use a seasonal time series but specify scale.inputs=TRUE, your time series will be treated as if they were not seasonal.

Origin of the bug

Why will xx always have a frequency of 1? To answer this question, let's take a look at the part of the code where this object is created;

if (!is.null(lambda)) {
    xx <- BoxCox(x, lambda)
    lambda <- attr(xx, "lambda")
} else {
    xx <- x
}

Whether lambda is null or not, "xx" is a ts object with a defined frequency attribute, as long as the "x" ts object also had a defined frequency attribute (remember x=y).

However, we can see in the following part of the script how "xx" will no longer be an object of type ts and will become a numeric vector:

if (scale.inputs) {
    if (useoldmodel) {
      scalex <- model$scalex
    }
    else {
      tmpx <- scale(xx[xsub], center = TRUE, scale = TRUE)
      scalex <- list(center = attr(tmpx, "scaled:center"), 
                     scale = attr(tmpx, "scaled:scale"))
    }
    xx <- scale(xx, center = scalex$center, scale = scalex$scale)
    xx <- xx[, 1]
}

The scale function transforms "xx" into a matrix object and the following line makes this matrix become a vector. This is a problem because the frequency attribute is lost.

 n <- length(xx)
 xx <- as.ts(xx)
 m <- max(round(frequency(xx)), 1L)

In the lines above, as.ts avoids getting an error in frequency(xx) but it doesn't preserve the frequency information since "xx" will now be a newly created ts object with no frequency assigned.

Solution

The fastest and simplest solution I can think of is changing these lines:

xx <- as.ts(xx)
m <- max(round(frequency(xx)), 1L)

to these lines:

xx <- ts(xx, frequency = frequency(x))
m <- max(round(frequency(x)), 1L)

as "x" is still a ts object with the original frequency defined.

I hope I have explained myself correctly. Thank you very much for your time!

robjhyndman commented 1 year ago

I don't think this is a problem. xx[,1] is a ts object with the correct frequency. You can see that the resulting model is seasonal with this simple example:

library(forecast)
nnetar(USAccDeaths) 
#> Series: USAccDeaths 
#> Model:  NNAR(2,1,2)[12] 
#> Call:   nnetar(y = USAccDeaths)
#> 
#> Average of 20 networks, each of which is
#> a 3-2-1 network with 11 weights
#> options were - linear output units 
#> 
#> sigma^2 estimated as 149061

Created on 2023-06-18 with reprex v2.0.2

Perhaps provide a reproducible example demonstrating the problem you are observing.

ignacio-ucm commented 1 year ago

Apparently I was getting this error because of not using the function directly but executing its code line by line. If instead of making

nnetar(USAccDeaths) 

you make

y <- USAccDeaths
P <- 1
repeats <- 20
xreg <- NULL
lambda <- NULL
model <- NULL
subset <- NULL
scale.inputs <- T
x <- y

and you execute the script line by line, then you will always get m=1

  useoldmodel <- FALSE
  yname <- deparse(substitute(y))
  if (!is.null(model)) {
    useoldmodel <- TRUE
    if (!is.nnetar(model)) {
      stop("Model must be a nnetar object")
    }
    m <- max(round(frequency(model$x)), 1L)
    minlength <- max(c(model$p, model$P * m)) + 1
    if (length(x) < minlength) {
      stop(paste("Series must be at least of length", minlength, 
                 "to use fitted model"))
    }
    if (tsp(as.ts(x))[3] != m) {
      warning(paste("Data frequency doesn't match fitted model, coercing to frequency =", 
                    m))
      x <- ts(x, frequency = m)
    }
    if (!is.null(model$xreg)) {
      if (is.null(xreg)) {
        stop("No external regressors provided")
      }
      if (NCOL(xreg) != NCOL(model$xreg)) {
        stop("Number of external regressors does not match fitted model")
      }
    }
    lambda <- model$lambda
    size <- model$size
    p <- model$p
    P <- model$P
    if (p == 0 && P == 0) {
      stop("Both p = 0 and P = 0 in supplied 'model' object")
    }
    if (P > 0) {
      lags <- sort(unique(c(seq_len(p), m * (seq_len(P)))))
    } else {
      lags <- seq_len(p)
    }
    if (is.null(model$scalex)) {
      scale.inputs <- FALSE
    }
  } else {
    if (length(y) < 3) {
      stop("Not enough data to fit a model")
    }
    constant_data <- is.constant(na.interp(x))
    if (constant_data) {
      warning("Constant data, setting p=1, P=0, lambda=NULL, scale.inputs=FALSE")
      scale.inputs <- FALSE
      lambda <- NULL
      p <- 1
      P <- 0
    }
    if (!is.null(xreg)) {
      constant_xreg <- any(apply(as.matrix(xreg), 2, function(x) is.constant(na.interp(x))))
      if (constant_xreg) {
        warning("Constant xreg column, setting scale.inputs=FALSE")
        scale.inputs <- FALSE
      }
    }
  }
  if (any(is.na(x))) {
    warning("Missing values in x, omitting rows")
  }
  if (!is.null(lambda)) {
    xx <- BoxCox(x, lambda)
    lambda <- attr(xx, "lambda")
  } else {
    xx <- x
  }
  xsub <- rep(TRUE, length(x))
  if (is.numeric(subset)) {
    xsub[-subset] <- FALSE
  }
  if (is.logical(subset)) {
    xsub <- subset
  }
  scalex <- NULL
  if (scale.inputs) {
    if (useoldmodel) {
      scalex <- model$scalex
    } else {
      tmpx <- scale(xx, center = TRUE, scale = TRUE)
      scalex <- list(center = attr(tmpx, "scaled:center"), 
                     scale = attr(tmpx, "scaled:scale"))
    }
    xx <- scale(xx, center = scalex$center, scale = scalex$scale)
    xx <- xx[, 1]
  }
  xxreg <- NULL
  scalexreg <- NULL
  if (!is.null(xreg)) {
    xxreg <- xreg <- as.matrix(xreg)
    if (length(x) != NROW(xreg)) {
      stop("Number of rows in xreg does not match series length")
    }
    if (any(is.na(xreg))) {
      warning("Missing values in xreg, omitting rows")
    }
    if (scale.inputs) {
      if (useoldmodel) {
        scalexreg <- model$scalexreg
      } else {
        tmpx <- scale(xxreg[xsub, ], center = TRUE, scale = TRUE)
        scalexreg <- list(center = attr(tmpx, "scaled:center"), 
                          scale = attr(tmpx, "scaled:scale"))
      }
      xxreg <- scale(xxreg, center = scalexreg$center, 
                     scale = scalexreg$scale)
    }
  }

>   print(class(xx))
[1] "numeric"
>   print(frequency(xx))
[1] 1
>  n <- length(xx)
>   print(class(xx))
[1] "numeric"
>   xx <- as.ts(xx)
>   print(class(xx))
[1] "ts"
>   print(frequency(xx))
[1] 1
>   m <- max(round(frequency(xx)), 1L)
>   print(m)
[1] 1

I still find it curious that it doesn't work for me but I know this is not the way that nnetar was intended to be used. Thank you very much and, again, sorry for bothering you.

robjhyndman commented 1 year ago

That's because there is a hidden forecast:::scale.ts function which ensures scale returns a ts object when the input is a ts object.