Open azev77 opened 4 years ago
Thanks for the praise!
Are you thinking about automatic lag order selection? The package can already to that, see the documentation here. The short of it is that the function selectmodel
can automatically select the best lag order for both the ARCH specification and the ARMA mean specification. By default it uses the BIC criterion, but you can pass you own. The function is multihreaded, so it will be very fast after the first run (in the first run, code generation is the bottleneck, which is single threaded for now).
@s-broda I'm embarrassed to say I missed that part of the docs. I see it now & it's great!
A table like this in the docs might help other users better see the "menu" of options:
It might even help to clarify that:
all conditional means are nested in ARMA
all conditional vol models (except EGARCH
) are nested in TGARCH
Turns out you have a lot more than Auto ARCH
you have Auto E/TGARCH{o,p,q}-ARMA{p,q}
.
auto.arma
is a special case:
auto.arma(df, bic)=selectmodel(ARCH{0}, df, meanspec=ARMA, criterion=bic)
I see the source here but I'm not sure I fully understand yet.
Below I will call model tuning parameters HP (hyper-parameters). Some questions-Feature requests (not sure which ones are important/urgent). Ofcourse I don't expect you to implement all these, this list comes from my Julian Greed.
ARMA{p,q}
.
Would it be possible to incorporate more general mean models?
TSAnalysis.jl has varima
There is also ARFIMA.jl https://github.com/fipelle/TSAnalysis.jl/issues/8
Or while we're being greedy, VARFIMA & VARFIMAX
I realize the original goal was to forecast volatility, but you seem to have the most thorough, well-funded TS package in the ecosystem. 1:maxlags
.
is it possible to allow 0:maxlags
? (then intercept
is a special case of ARMA
)
is it possible to have different ranges for different HP: p 0:2
, q 1:2
?
currently it searches over ALL combinations of lags, auto.arima
has stepwise=true
allowing stepwise selection, which is faster but less ambitiousselectmodel
?:volatility/:variance/:return/:VaR
It would be awesome if it was possible to return the entire conditional distribution, as discussed on discourse.
Also, IIUC the current forecast horizon is h=1
, it would be great to generalize to any h
Hi @azev77
I like the table. Would you be able to open a pull request? I think it would best fit here. The code for this is here. Otherwise, you could post the markdown code and I'll add it (it seems to be included as a picture in your code).
About your other points:
GARCH{p, q}
with q=0
), which can lead to estimation problems. But there are valid reasons to want to do that. I've just opened a PR (#61) that introduces a minlags
keyword argument. It defaults to 1, but you can set it to zero, or to something larger than 1 to get only a subset. Non-"rectangular" regions are unlikely to happen, because it wouldn't dovetail with the way I've implemented selectmodel
based on CartesianIndices
.
julia> models = [selectmodel(VS, BG96; dist=D, minlags=0, maxlags=1) for VS in subtypes(UnivariateVolatilitySpec), D in setdiff(subtypes(StandardizedDistribution), [Standardized])];
julia> best_model = models[findmin(bic.(models))[2]]
This will minimize the BIC over all lag lengths, distributions, and GARCH model classes. I've added this to the documentation in #61.
5. I've implemented multi-step forecasts in #61 (except for the VaR, which requires Monte Carlo in the general case). Regarding returning the conditional distribution, I'll have to think about how the API for this would look, because `StandardizedDistribution` doesn't carry the mean and variance obviously. To get the conditional distribution, we'd need a type that wraps a `StandardizedDistribution` together with the mean and variance. But then to do anything with it, we'd need to implement a host of methods (pdf, cdf, etc). Alternatively, this could be special cased for all included distributions and just return the corresponding unstandardized distribution from Distributions.jl (I don't think they have the GED though?). Note also that like for the VaR, this would only work one-step ahead without Monte Carlo.
6. Thanks. Adding additional models is very simple, because the package is fully modular. I just need to find the time for it. Or you could give it a shot :) Just create a type that holds the coefficients in a vector, and implement the following methods for it: `nparams`, `presample`, `update`, `uncond`, `startingvals`, `constraints`, and `coefnames`. The meat of it is in `update!`, which implements the GARCH recursion. This will give you everything else (estimation, simulation, standard errors, display etc.) for free. You can use [this file](https://github.com/s-broda/ARCHModels.jl/blob/master/src/EGARCH.jl), which does this for EGARCH, for guidance.
Once CI passes, I'll merge #61 and release version 1.1. Would be cool if you could try out the new features (multi-step forecasting and minlags).
Thanks for your input!
Simon
Let me clarify what I meant by resampling above (my point # 3).
Suppose I have a time-series r_t
for t=1,...,100
and want to decide whether an AR{1} or AR{3} model has better out of sample fit.
t=1,...,90
t=91,...,100
t=91,...,100
and I compute a score for each model (eg: BIC, RMSE, MAPE etc). There are many different ways to do this. There are different types of resampling techniques (eg K-fold CV). There are many different scores. Rob Hyndman has a nice summary of Time Series Cross-Validation here.
Update: I think I figured out what selectmodel()
in your code is doing.
It is fitting every combination of selected lags and then selects the model which minimizes the criterion.
The key is that it does the model selection in-sample. The concern w/ this approach is overfitting. The traditional stats approach to fight overfitting is to penalize complexity (reward parsimony) for example by using a score such as bic
(instead of aic
). Another approach is to use cross-validation & compute the score by comparing out-of-sample predictions w/ true returns. Hansen & Lunde 2005 use the out-of-sample approach.
The updated list of potential feature requests:
Volatility specifications
:
eg these models from Hansen & Lunde 2005 are currently not implemented
Mean specifications
: varima
(e.g. TSAnalysis.jl), varfima
(e.g. ARFIMA.jl), varfimax
GARCH-M
GARCH-in-meanError Distributions
: it would be great to integrate all "legal" distributions from Distributions.jl.
For example, I created this script: https://github.com/alan-turing-institute/MLJ.jl/issues/574Probabilistic Prediction
: return conditional distribution
@ablaom coded this in MLJ: https://github.com/alan-turing-institute/MLJ.jl/issues/552
@barunik has: https://github.com/barunik/DistributionalForecasts.jl Test predictive accuracy
:
I agree that we'd need to use cross validation to get an unbiased estimate of the BIC, but as far as I can tell that's usually not what people do in this literature; they just pick the model that minimizes the BIC (or AIC) in-sample. Also, these data aren't IID, so you'd need to do something fancier than just K-fold CV. In principle this could be implemented as a scoring function that could be passed to selectmodel
.
Thanks for the wishlist! I'll see what I can add in the future. My funding for working on this has run out, so I don't have as much time for working on this. PRs are welcome though!
Note that GJR is the same as TGARCH, so this is already implemented.
I see, bc in the paper, they're defined separately:
Those two are clearly not the same (different power of sigma). Usually TGARCH is understood to be what's called GJR above.
Btw, it's common for packages to show off benchmarks (w/ R & Python) in the Readme: "We find that ARCHModels.jl is faster than Python’s ARCH package by a factor of about 6, faster than Matlab’s Econometrics Toolbox by a factor of about 12, and faster than R’s rugarch package by a factor of about 50. This is despite the fact that all packages (except ARCHModels.jl) implement the likelihood function in a compiled low-level language."
Or even better, add a nice benchmark figure like here.
Yeah, I had thought about that. The best would be to have the benchmarks run on CI automatically, but that's not a good idea because you're not guaranteed a specific machine there.
@s-broda
I'd like to work on adding family GARCH
from the list above, but not sure where to start.
Can you help me get started with this?
Hi @azev77
Sure! Basically you just need to follow the structure in TGARCH.jl. I.e. (using o
instead of m
for consistency with the rest of the code), you start buy defining a struct FamGARCH{o, p, q, T<:AbstractFloat} <: UnivariateVolatilitySpec{T}
. This struct should contain a vector for the model parameters, which should appear in the order in which they relate to the type parameters o
, p
, q
.
You then implement the following functions:
nparams
: number of parameters given o
, p
, and q
( I guess 1+ o + p + 3q
in this case?)presample
: how many presample values are needed for conditional MLE (usually max(o, p, q)
)update!
: the actual GARCH recursionuncond
: the unconditional variance given the parametersstartingvals
: the starting values for all parametersconstraints
: necessary (but usually not sufficient) box constraints that keep the model in the stationary region.subsetmask
is related to subset GARCH models: it takes a large model (say GARCH{2, 2}
) and a tuple (say (1, 1)
) representing a smaller model, and returns a BitArray
that has false
wherever the parameter in the large model is zero for the subset model, e.g.,
julia>julia> mask = ARCHModels.subsetmask(TGARCH{2, 2, 2}, (0, 1, 1))
7-element BitArray{1}:
1
0
0
1
0
1
0
subsettuple
is basically the inverse function to subsetmask
. It takes a BitArray
and returns the tuple:
julia> ARCHModels.subsettuple(TGARCH{2, 2, 2}, mask)
(0, 1, 1)
These last two functions are identical for EGARCH
and TGARCH
, but will be different for FamGARCH
because you have 3 parameters relating to q
.
With the above done, you should get simulation, fitting, model selection, etc. for free.
Have fun! Simon
@s-broda you may have seen this, the textbook Financial Risk Forecasting now uses ARCHModels.jl. They have blanks for: APARCH OGARCH
Cool, thanks for the pointer!
APARCH should be coming soon.
Hello, first thanks for all the hard work that went into this great package! I was wondering if allowing to fit ARMA models directly would still fit within the scope. I know that I can do:
fit(ARCH{0}, data; meanspec=ARMA{p,q})
but a wrapper like this would be very neat:
fit(ARMA{p,q}, data)
I'm saying this since it took me a minute to figure this out from the docs and I was close to moving on ;) Also, it would be nice to allow specifying no intercept in ARMA, maybe like this:
fit(ARMA{p,q}, data; meanspec=NoIntercept())
Thanks for the suggestion. The convenience methods for fit(::Type{<:ARMA}, data
(and selectmodel
) have been implemented (https://github.com/s-broda/ARCHModels.jl/commit/99b389f677edb1f43f2bfc31176ff2c635ea36dc) and and will make it into 2.2.
Fitting ARMA models w/o intercept the way you suggest would require a bigger rewrite, so this is unlikely to happen. It would be quite easy though to implement an additional meanspec
(ZeroMeanARMA
?) that does this. Not sure if that would be worth it though, in my experience it is pretty rare that one needs this.
Hi and thanks for the awesome package! It's becoming common to automate time-series models. Forecast.R has auto.arima() which returns the "best" Arima model according to some criteria defined by the user.
It would be awesome if your package has some options to select the "best" ARCH model. Here is some discussion about doing this in R. Also check out this post.
Thanks again for your awesome package.