Closed dashaub closed 8 years ago
No, this only accounts for a small part of the uncertainty. The inherent residual uncertainty would be ignored.
If one considers the predictions from the different networks as a bootstrap estimate of the conditional expectation (arguable, since currently all networks are fitting the exact same data, just with their parameters being initialized differently) we could add, in a somewhat ad-hoc manner, an estimate of the error term by bootstrapping the residuals from the fits.
Otherwise, for a more formally developed approach, how about implementing one of the bootstrap approaches presented here since nnetar
is essentially a nonlinear autoregressive model.
Their approach essentially follows (pg 7):
This should account for variability in both the errors and the model estimation. I should note that they present this algorithm for linear autoregressive models, but they do state (on pg. 17) that it applies the same to nonlinear models (with some caveats/concerns for longer forecasts than the one-step-ahead predictions).
In fact, this could be implemented as a more general function, although a number of the other forecast methods already have their own specific bootstrap code. Do you think this sounds like a useful thing to do?
The problem with Pan & Politis' algorithm is that it assumes stationarity, which will not necessarily be true for neural net models. They also spend a lot of effort on re-fitting models in order to take account of model uncertainty. I think we can get around a lot of that work with neural nets because the model uncertainty is already understood through the various networks with random starting values.
So something like this should do ok:
That is not as sophisticated as P&P, but should do a pretty good job of getting sensible prediction intervals, taking account of both residual variance and model uncertainty.
Yes, I think that would be very useful.
I agree that the Pan & Politis' algorithm would be much more computationally expensive. The one part I'm not completely sure of, is how much of the model uncertainty would be captured simply using the neural networks fit with random starting values. How many iterations are done and how much regularization is used can affect their degree of convergence and stability. On the one hand, if they're iterated for long enough, they should all converge closer together (depending on how multimodal the space may be). On the other hand, iterating for too long might lead them to overfit.
If they converge to fairly similar values, and all overfit together, using them to sample future paths would not indicate the model being incorrect. In that situation, re-fitting a new network with bootstrapped noise would better capture model overfitting.
If they're drastically different from each other, creating sample paths from the individual ones might not give a representative estimate of the uncertainty in the point-forecast, since that value is an average of all of them.
I attached a couple figures comparing how much they vary as the number of iterations and weight decay (i.e. regularization) change for a sample dataset. Note that the default parameters of nnet
are maxit=100
and decay=0
. The blue lines are 100 different single network predictions (not sample future paths, just the forecast from each network). The exact parameters I chose to display are somewhat arbitrary to depict the different behaviors and the effects would show up at different points depending on the size and complexity of the problem. I added the code I used for these simple plots below.
Nevertheless, I think of these points more as "food for thought" as I agree that your simplified algorithm should be the starting point anyway. Once there we can test the performance more directly and reassess if more is needed.
nn_variations_few-iter.pdf nn_variations_more-iter.pdf nn_variations_many-iter.pdf
library(forecast)
library(fpp)
set.seed(1234)
ntrain <- 120
ntest <- 164-ntrain
ts_train <- ts(usconsumption[1:ntrain,1])
ts_test <- ts(usconsumption[(ntrain+1):164,1])
xtrain <- usconsumption[1:ntrain,2]
xtest <- usconsumption[(ntrain+1):164,2]
##
iter <- 100
p <- 3
maxit <- 300
##
par(mfcol=c(2, 3))
for (decay in c(0, 0.3, 1)){
plot(ts_train, xlim=c(0, 165), main=paste("without xreg,", ", decay =", decay, ", maxit =", maxit))
for (i in 1:iter){
nn_fit <- nnetar(ts_train, p=p, decay=decay, maxit=maxit)
nn_fcast <- forecast(nn_fit, h=ntest)
lines(nn_fcast$mean, col="blue")
lines((length(ts_train)+1):164, ts_test)
}
##
plot(ts_train, xlim=c(0, 165), main=paste("with xreg,", ", decay =", decay, ", maxit =", maxit))
for (i in 1:iter){
nnxreg_fit <- nnetar(ts_train, p=p, decay=decay, maxit=maxit, xreg=xtrain, repeats=1)
nnxreg_fcast <- forecast(nnxreg_fit,xreg=xtest, h=ntest)
lines(nnxreg_fcast$mean,col="blue")
lines((length(ts_train)+1):164, ts_test)
}
}
That doesn't look good! We will probably need to include model re-fitting.
Oops, just noticed that for the top row of plots, I left out the repeats
argument (and thus used the default of 20). The lines aren't as tight with the correct value, but the general point still stands.
Since the nnetar object is an average of (default) 20 individual neural nets, could we build prediction intervals for the final nnetar forecast using the predictions from each individual net? This would seem especially appropriate when
repeats
is large. We're basically creating everything we need now for bootstrap prediction intervals but throwing them away when we wrap it up into a point forecast. Edit: several approaches are described here http://alumnus.caltech.edu/~amir/pred-intv-2.pdf