srlanalytics / bdfm

Bayesian dynamic factor model estimation and predictive statistics including nowcasting and forecasting
MIT License
5 stars 6 forks source link

Mixed frequency handling #29

Closed christophsax closed 5 years ago

christophsax commented 5 years ago

Your comment from on an older branch:

OK — I don’t really like any of the R ts classes, and I’m not aware of any that can handle mixed frequency data (I’ve got a fair amount of mixed frequency modeling code and will probably put it into the package one day). Current input is just anything that can be handeled by as.matrix(), which is pretty flexible, and period is indexed by row. Anyhow, if data is input as a time series it should be pretty easy to convert the (matrix format) output to whatever it went in as. Go tabular data!

I wand to show how good tsbox is with mixed frequency data.

Here are 3 series, one quarterly, one monthly, one daily.

stocks = setNames(dataseries::ds("STK.GDR"), c("time", "value"))
exports <- dataseries::ds("TRD.A.T0")
gdp <- dataseries::ds("GDP.PBRTT.A.R")

A long data frame is just the natural way to store mixed frequency data, as it does not store tons of NA values.

library(tsbox)
data <- ts_c(stocks, exports, gdp)
#> [value]: 'TRD.A.T0'
#> [value]: 'GDP.PBRTT.A.R'
data <- ts_span(data, 2015, 2017)  # make it a bit shorter
head(data)
#>       id       time   value
#> 1 stocks 2015-01-05 8816.37
#> 2 stocks 2015-01-06 8757.10
#> 3 stocks 2015-01-07 8780.60
#> 4 stocks 2015-01-08 8999.38
#> 5 stocks 2015-01-09 8975.70
#> 6 stocks 2015-01-12 9017.40

But we can convert it to anything, so a ts object is a nice start:

data_ts <- ts_ts(data)

tail(data_ts, 40)
#>         stocks  exports      gdp
#> [693,] 8496.74       NA       NA
#> [694,] 8544.82       NA       NA
#> [695,] 8628.02       NA       NA
#> [696,]      NA       NA       NA
#> [697,]      NA       NA       NA
#> [698,] 8572.19       NA       NA
#> [699,] 8595.24       NA       NA
#> [700,] 8623.48       NA       NA
#> [701,] 8528.45 16514.43       NA
#> [702,] 8526.58       NA       NA
#> [703,]      NA       NA       NA
#> [704,]      NA       NA       NA
#> [705,] 8590.56       NA       NA
#> [706,] 8654.48       NA       NA
#> [707,] 8665.60       NA       NA
#> [708,] 8685.38       NA       NA
#> [709,] 8827.99       NA       NA
#> [710,]      NA       NA       NA
#> [711,]      NA       NA       NA
#> [712,] 8761.89       NA       NA
#> [713,] 8882.92       NA       NA
#> [714,] 8861.49       NA       NA
#> [715,] 8924.73       NA       NA
#> [716,] 8945.48       NA       NA
#> [717,]      NA       NA       NA
#> [718,]      NA       NA       NA
#> [719,] 8954.67       NA       NA
#> [720,] 8964.94       NA       NA
#> [721,] 8958.17       NA       NA
#> [722,] 8966.45       NA       NA
#> [723,] 8964.87       NA       NA
#> [724,]      NA       NA       NA
#> [725,]      NA       NA       NA
#> [726,]      NA       NA       NA
#> [727,] 8997.41       NA       NA
#> [728,] 8996.88       NA       NA
#> [729,] 8992.19       NA       NA
#> [730,] 8965.70       NA       NA
#> [731,]      NA       NA       NA
#> [732,]      NA 18292.86 168905.1

See how nicely the series are aligned. Weekdays are handled correctly, including the day off in row 726. This works for weekly, bi-weekly, bi- annual and even intraday series. No matter what’s the input, series are regularized and aligned, and filled with NAs

To get the matrix we need for further processing:

mat <- unclass(data_ts)

Created on 2018-12-15 by the reprex package (v0.2.1)

srlanalytics commented 5 years ago

OK, thanks for the good example! I was thinking of putting basic mixed frequency support into the package. Dealing with daily/monthly stuff is too much (due to differing numbers of days in the month... I've done this in some of my production code but it gets a little heavy), but basic support shouldn't be too hard.

srlanalytics commented 5 years ago

Decided to go ahead and include support for mixed frequency data. Things seem to run well at the moment... have a look at the [US_GDP.R](https://github.com/srlanalytics/bdfm/blob/seas_we()_updates/inst/Examples/US_GDP.R) example. Need to do a bit more testing to make sure it's all doing the right thing, but looks good at the moment. Phew!

Defaults are all still the same... i.e. uniform frequency is default.

This is pretty good stuff... what about a short paper comparing methods for handling mixed frequency data using bdfm?