srlanalytics / bdfm

Bayesian dynamic factor model estimation and predictive statistics including nowcasting and forecasting
MIT License
5 stars 6 forks source link

Mixed frequencies in ML and PC #79

Closed christophsax closed 5 years ago

christophsax commented 5 years ago

This is currently not possible. Can we not use the same routines here as well? Both seems highly useful.

SethOttoQuant commented 5 years ago

Both ML and PC estimation have issues with mixed frequency estimation, though various ways around these problems are common enough. I'll begin with ML estimation:

Maximum Likelihood Estimation

Mixed frequency estimation via maximum likelihood is no problem when we estimate the model numerically, i.e. by searching over all possible parameter values for those which maximize the log likelihood from the Kalman Filter. We do not use this approach in the bdfm package as it is very slow and subject to getting stuck in local (but not global) maxima.

When estimating the model via Watson and Engle's algorithm mixed frequency estimation becomes a little problematic due to the mechanics of the algorithm. The algorithm works by keeping track of an extra lag of factors (i.e. one more than we specify in the transition equation) to derive the adjustments to our otherwise OLS estimates of parameters. These adjustments compensate for the fact that factors are not observed.

With mixed frequency data, we specify the transition equation at high frequency. However, we typically want to include enough lags in the model include at least one lag of the data at low frequency, sometimes more. With quarterly-monthly data that's no big deal --- we need at least 3 lags (three months in a quarter). With monthly-daily data, however, this becomes problematic. We need at least 31 lags of the data, which makes the Watson Engle algorithm prohibitively slow due to the fact that we need to use the standard Kalman Smoother, not a more efficient disturbance smoother (the latter does not return the variance of factors given all observations which is required for the Watson Engle algorithm). With Bayesian estimation things aren't so slow because we are able to use the disturbance smoother which handles lots of lags much better.

The only way around this I'm aware of that is not prohibitively slow (slow meaning numerical estimation or Watson Engle estimation with lots of lags) is to specify the model at low frequency. Whether PC, Bayesian, or ML, specifying the model at low frequency is a totally different approach to what we've done with this package. I've done it (it's the approach I use with my new company OttoQuant), but I'm not really willing to give it away.

Principal Component Estimation

Principal component estimation relies on there being few missing observations since there is not a particularly clean way to calculate principal components when observations are missing. With mixed frequency data, lots of observations are necessarily missing as we have no high frequency observations of low frequency series. The way most people get around this issue is to simply estimate principle components using the high frequency data only, then plugging the resulting parameter estimates into the dynamic factor model. However, if your low frequency series are central to your model or if you don't have many high frequency series, this approach probably won't work well. It's a fudge and I don't really like it, so I've not included it. However, it is very easy to do using the existing bdfm framework, so if you want to include it we can, with the disclaimer that the results will probably be poor.

christophsax commented 5 years ago

With monthly-daily data, however, this becomes problematic

I thought the Bayesian method also does not support monthly daily. Or does it? If so, we need an example! I would be happy with standard quarterly-monthly ML, not supporting the daily stuff is fine.

SethOttoQuant commented 5 years ago

The Bayesian method should cover any frequency mix: monthly-quarterly, hours-half hours, even minutes-seconds, though with that many lags (60) it would probably be slow. The one caveat is that the package does not handle irregular frequencies properly. When given mixed frequency data, the package looks for patterns of missing observations, and interprets regular patterns as mixed frequency data. A frequency argument of 1 means the series is observed every period. A frequency argument of 2 means the series is observed every other period. 3 is every third period, and so on. With monthly data, more months have 31 days than anything else, so if the data mix is monthly-daily the package sets the frequency of monthly data to 31. This means we will be using too many days in months like September with 30 days --- observations for September would actually use factors for September as well as the last day of August. Not ideal, but much simpler than trying to change the frequency mix at every period.

I'll work on putting together an example with monthly-daily data.

Regarding mixed frequency ML estimation, I'd rather not get into it as (a) it's a big project and would take a lot of time as the filtering/smoothing algorithms for ML are not the same as the Bayesian filtering/smoothing algorithms, (2) it would still be slow for longer lag lengths and (3) estimates won't be any better than Bayesian estimates anyhow --- and probably worse.

christophsax commented 5 years ago

Thanks, that's very useful.

So I didn't know The Bayesian method should cover any frequency mix. I just opened a separate issue with the pros and cons of the different methods (#80) and one for the example (#81).

To summarize, you are saying:

So I am fine with both of them not supporting mixed frequencies.

One final question:

Let's say I have 1000 daily financial series and 10 monthly and quarterly series. Would it make sense to use PC to aggregate the 1000 series to a few factors, and use them withing the Bayesian method, along with the 10 series?

Otherwise, we can close this.