Addition of skewed distributions for the innovation terms

chm-von-tla commented 3 years ago

Hello,

I did my thesis on a comparative analysis of Value-at-Risk methodologies and I used this package quite a lot (thank you!)

One addition I had to make was to include skewed distributions for the innovation terms since this usually leads to superior VaR forecasting performance (see: https://doi.org/10.1002/for.2303). The distribution I implemented was Hansen's Standardized Skewed T Distribution (see: https://doi.org/10.2307/2527081) which is defined as:

2021-03-03-14:03:23 Its analytical quantile function is provided in https://doi.org/10.1016/s0165-1889(02)00079-9 and is defined as:

2021-03-03-14:18:27 with A_η^-1 being the inverse cdf of a t-distribution with η dof

Another approach that could be taken is to adopt the framework found in (https://dx.doi.org/10.2139/ssrn.821) under which any unimodal continuous symmetric distribution can become asymmetric by changing the scale at each side of the mode. In the same paper they demonstrate their framework by creating a skewed version of the Standardized GED.

If you think that any of the above would be a worthwhile addition to this package, I provide as a starting point my implementation of Hansen's SKT-Distribution. (#86) There are at least two problems with my code:

startingvals is essentially duplicated from the corresponding function for StdT. I tried to do startingvals(::Type{<:StdSkewT}, data::Array{T}) where {T<:AbstractFloat} = [startingvals(StdT,data), zero(T)] but it didn't work
performance is bad. On the BG96 dataset fitting a GARCH(1,1) with StdSkewT errors was almost 10 times slower than the same model with StdT errors.

s-broda commented 3 years ago

Thanks for your contribution! I'll be happy to incorporate this.

To get this merged, could I ask you to do the following:

Add some tests. Ideally coverage won't drop below the current almost 100%
CI is failing at the moment. This is because of a problem in the doctests, see here: https://travis-ci.org/github/s-broda/ARCHModels.jl/jobs/761289701. The problem is in line 332 of usage.md. You could either exclude the StdSkewT from consideration there, or make sure that the test doesn't fail.
To avoid the problem with the duplicated code for the starting values, you can use startingvals(::Type{<:StdSkewT}, data::Array{T}) where {T<:AbstractFloat} = [startingvals(StdT, data)..., zero(T)] (note the splatting operator ...).
- I've left some more comments in the PR. Most importantly, rand and quantile didn't seem to work before,

Do you have an idea why this is so much slower? I didn't see any type instabilities, so maybe it's just a result of there being another parameter to estimate.

chm-von-tla commented 3 years ago

Thanks for the excellent feedback. I will be making the changes soon. I should probably also make changes to the documentation to reflect that another error distribution is available.

Do you have an idea why this is so much slower? I didn't see any type instabilities, so maybe it's just a result of there being another parameter to estimate.

I haven't done any profiling but I highly suspect that it is because of all those expensive a, b, c, S functions in logkernel [1]

Consider

@inline logkernel(::Type{<:StdT}, x, coefs, iv) = (-(coefs[1] + 1) / 2) * log1p(abs2(x) *iv)

versus

@inline logkernel(d::Type{<:StdSkewT}, x, coefs, iv) = (-(coefs[1] + 1) / 2) * log1p(1/abs2(1+coefs[2]*S(d,x,coefs)) * abs2(b(d,coefs)*x+a(d,coefs)) *iv)

I just spotted an inefficiency in my logkernel function. I call the function S which itself calls a and b and later I call again a and b. I am not sure if the compiler can optimize away this problem. I will change it and see if there is any considerable gain. I suppose this code could be optimized even further, but I suspect that it will always be noticeable slower than the StdT case because of the added complexity.

[1] (the a,b,c functions correspond to the equations (11,12,13) and the S function determines on which side of the mode we are [the mode being (-a/b)] so that we choose the correct sign (hence the name S), for the parameter λ in equation (10)

chm-von-tla commented 3 years ago

The inefficiency on logkernel was huge. With the change introduced by my new commits a 9x slowdown on fitting a GARCH(1,1) model on BG96 with StdSkewT errors compared to StdT has now been reduced to about only 3x slower, which, taking the added complexity into account, I would deem as acceptable performance

chm-von-tla commented 3 years ago

Since my pull request for the addition of Hansen's Skewed T Distribution has been merged to master it is time to close this issue.

As closing thoughts I will point to some directions for further improvement on this subject. If anyone else (or my future self :) ) has the time and motivation, a skewed standardized GED, as found in https://dx.doi.org/10.2139/ssrn.821 could be also added.

s-broda / ARCHModels.jl

Addition of skewed distributions for the innovation terms #87