statsmodels / statsmodels

Statsmodels: statistical modeling and econometrics in Python
http://www.statsmodels.org/devel/
BSD 3-Clause "New" or "Revised" License
10.19k stars 2.97k forks source link

BUG: Statespace: complex step differentiation with stationary initialization #2839

Closed ChadFulton closed 8 years ago

ChadFulton commented 8 years ago

Stationary initialization computes the initial state covariance matrix for the Kalman filter by solving a discrete Lyapunov equation. The discrete Lyapunov solver (in this case from Scipy) explicitly supports complex matrices, meaning that it supports operation as a complex function.

As I understand it, complex step differentiation requires the underlying function to be a real function of real variables (e.g. http://www.math.u-psud.fr/~maury/paps/NUM_CompDiff.pdf) and so when computing the initial state covariance matrix for the complex step derivative, we want the Lyapunov solver to not act as a complex function.

In practice, this means we want it to not do conjugate transposes (as it would if it were a complex function) but just regular transposes, even though the input matrices will be complex.

In a related issue, complex step differentiation is only appropriate when the underlying data and parameters are real; if they are complex, then we need to initialize the model using the Lyapunov solver operating as a complex function and compute numerical derivatives using finite differencing.

One implication of this is that the state space models need to distinguish between loglike calls performed as usual (in which case complex input parameters would preclude using complex step differentiation and the Lyapunov-as-complex function would be used for initialization) and loglike calls performed in gradient approximations (in which case complex input parameters are a result of the differentiation process and the and the Lyapunov-as-real function would be used).

josef-pkt commented 8 years ago

Does this produce incorrect numbers for the differentiation?
I don't think I saw anything about this when I did the complex derivatives.

Can you put in an internal switch to use standard forward differences in this case?

(labels?)

ChadFulton commented 8 years ago

I think it does, I'm trying to come up with a minimal example.

ChadFulton commented 8 years ago

Here's the minimal example in a notebook:

http://nbviewer.jupyter.org/gist/ChadFulton/49094a5192b771b33f98

Edit: made a few changes to the notebook for clarity; use ?flush_cache=true to see them in the next few minutes.

Incorrect numbers are produced by using complex step differentiation along with the complex version of the discrete Lyapunov solver.

josef-pkt commented 8 years ago

good notebook.

Can you write a "real" wrapper around the scipy function? e.g. by sending the real and imaginary part separately through solve_discrete_lyapunov or is it better just to use finite differences?

ChadFulton commented 8 years ago

Another problem: Cholesky factorization, which requires Hermitian matrices (i.e. real diagonal elements).

print np.linalg.cholesky([[4]])
print np.linalg.cholesky([[4+1e-9j]])
print np.linalg.cholesky([[4+1e1j]])
print np.linalg.cholesky([[4+1e9j]])

yields

[[ 2.]]
[[ 2.+0.j]]
[[ 2.+0.j]]
[[ 2.+0.j]]
ChadFulton commented 8 years ago

It appears that in each of these cases, the problem can be "solved" by providing less support for complex numbers.

In the discrete Lyapunov case (used to initialize the state covariance matrix in the stationary case), by replacing conj().transpose() with transpose() in a couple of places I can get agreement between finite difference and complex step.

In the Cholesky factorization case (used to transform arbitrary matrices into stationary coefficient matrices for VARs and DFMs), replacing the Cholesky with a matrix square root function allows (pretty close) agreement between finite difference and complex step.

ChadFulton commented 8 years ago

Similarly, in the Kalman filter itself, LU factorization rather than Cholesky factorization must be used for the complex step case.

ChadFulton commented 8 years ago

For the Cholesky factorization in the transformation functions, I don't think it's worth rewriting the transform / untransform functions using the matrix square root (a lot of work in the Cython case, and I don't know if it's justified in this case to use matrix square root rather than Cholesky).

Instead, I think it's better to use the the chain rule here, where complex step differentiation is done only on transformed parameters, and we use finite difference differentiation on the transformation step.

ChadFulton commented 8 years ago

I will have a PR soon; it will substantially improve bse estimation and comparison to e.g. Stata results.