mingstat / ZIM

Zero-Inflated Models (ZIM) for Count Time Series Data with Excess Zeros
https://mingstat.github.io/ZIM/
7 stars 1 forks source link

Forecasting with ZIM / forecast method? #1

Open andybega opened 5 years ago

andybega commented 5 years ago

Hi, thanks for writing and making your package publicly available!

If I understand correct there is no straightforward way to forecast with ZIM right now. I'd like to make a request for a forecast (or predict) method.

Since the forecast package already includes a broad range of univariate forecasting models with a consistent API, it would be nice to have something that is easy to use in a similar fashion, e.g.:

library("forecast")
library("ZIM")

data(syph)
count <- syph$a33
count.lag1 <- bshift(count > 0)
trend <- 1:length(count) / 1000

mdl1 <- auto.arima(count)
forecast(mdl1, h = 3)
# Point Forecast     Lo 80    Hi 80     Lo 95    Hi 95
# 210       2.836695 -1.014251 6.687642 -3.052820 8.726211
# 211       2.636530 -1.243722 6.516783 -3.297804 8.570865
# 212       2.618010 -1.265695 6.501715 -3.321605 8.557624

# with covariates
mdl2 <- auto.arima(head(count, -3), xreg = head(trend, -3))
forecast(mdl2, h = 3, xreg = tail(trend, 3))
# Point Forecast     Lo 80    Hi 80     Lo 95    Hi 95
# 207       2.148899 -1.655740 5.953538 -3.669794 7.967592
# 208       2.135985 -1.668653 5.940624 -3.682707 7.954678
# 209       2.123072 -1.681567 5.927711 -3.695621 7.941765

# ZIM
mdl3 <- zim(count ~ count.lag1 + trend | trend)
# e.g.:
forecast(mdl1, h = 6, xreg = ..., zreg = ...)
# something something here
mingstat commented 5 years ago

Thank you very much for your suggestion!

ZIM currently has no built-in function that can be directly used to forecast/predict. To do so, one will need to extract results from fitted models. For example,

# regression coefficients
mdl3$para
[1]  1.4894234  0.2211137 -1.0100405 -1.9332129  8.6051691
# variance-covariance
solve(mdl3$info)
             [,1]          [,2]         [,3]          [,4]         [,5]
[1,]  0.014386842 -0.0097871129 -0.054419108  0.0019238845 -0.010199695
[2,] -0.009787113  0.0101438422  0.016866160 -0.0009788275  0.003066099
[3,] -0.054419108  0.0168661602  0.444716763 -0.0087614983  0.078876967
[4,]  0.001923885 -0.0009788275 -0.008761498  0.1383537758 -0.939161355
[5,] -0.010199695  0.0030660993  0.078876967 -0.9391613549  7.886376532

Adding point predictions should be straightforward, but figuring out the prediction intervals may require some thinking and extra work. I'll take a look at the API of the {forecast} package and consider adding a forecast/predict S3 function to ZIM.

andybega commented 5 years ago

Thanks!

ladm2110 commented 5 years ago

Is there any work in progress towards this objective?

mingstat commented 5 years ago

Thanks for checking! Computing prediction intervals requires a significant amount of work to derive the correct mathematical formulas. Unfortunately, I currently have limited time to add this new feature. Any contributions are welcome and much appreciated!

ladm2110 commented 5 years ago

I have been reading your the paper "Markov regression models for count time series with excess zeros: A partial likelihood approach" and I would like to replicate your forecast results. Do you have any docs related to the one-step ahead predictions procedure you used? I am also working on a PoC for a time series with excess zeros.

mingstat commented 5 years ago

Yes, I just uploaded the data example to https://biostatstudio.github.io/ZIM/example.pdf. With this document, you should be able to replicate all the results in the paper. As I mentioned earlier, point predictions are relatively easier to implement, the difficult part is to calculate prediction intervals.

ladm2110 commented 5 years ago

Hi, I have two things I would like to clarify:

  1. It seems to me that the order of the parameters is wrong in this expression, when I print the fit.zip$para vector the order I see is (intercept, ar1, trend, ...).

    zip.lambda <- exp(sum(c(1, trend[t], ar1[t]) * fit.zip$para[1:3])) Can you comment on this?

  2. In the example the point forecast are fixed, they are stored in the vector "count.new". How where these values obtained? Is it safe to say that using the rzip method I can sample new point forecasts?

Thanks in advanced!

ladm2110 commented 5 years ago

Hi again, I don't need an answer on the first question, I see that it depends on the order of the params in the "formula" you pass to the zim method. The mistake was mine.

mingstat commented 5 years ago

The 'count.old' and 'count.new' variables contain observed time series that are used as training and validation data, respectively. In the example, the focus is to predict Pr(Y = 0) and Pr(Y > c), where 'c' is a prespecified cutoff. With 'zip.lambda' and 'zip.omega' from the fitted model, one can easily calculate the predicted mean or use the 'rzip' function to sample observations.