precision of scale_matrix_exp_multiply

lukas-rokka commented 5 years ago

Description

scale_matrix_exp_multiply(t, A, B) doesn't always equal matrix_exp(tA) * B. Might be a precision issue, where scale_matrix_exp_multiply does some sort of approximation and therefore doesn't give same result as matrix_exp(tA) * B when A has a low condition number.

Example

This example is based on the discretization of Linear Time-Invariant ODE systems (https://github.com/eddelbuettel/rcppkalman/blob/master/src/ltidisc.cpp).

Ac =t(matrix(c(-4.47584,0.0732692,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0.0183173,-0.0366346,0.0183173,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0.0146538,-0.179407,0,0,0.0268619,0,0.00639568,0,0.0639568,0,0,0.0426379,0.0249012,0.00748033,0.00498689,0.00415574,
0,0,0,-5.86896,0.447212,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0.223606,-0.447212,0.223606,0,0,0,0,0,0,0,0,0,0,0,
0,0,0.328758,0,0.447212,-2.33289,0,0.0493138,0,0.493138,0,0,0.328758,0.685714,0.0576769,0.0384513,0.0320427,
0,0,0,0,0,0,-42.15,11.4398,0,0,0,0,0,0,0,0,0,
0,0,2.3972,0,0,1.51023,11.4398,-26.3403,0,3.59579,0,0,2.3972,5,0.420561,0.280374,0.233645,
0,0,0,0,0,0,0,0,-0.122394,0.122394,0,0,0,0,0,0,0,
0,0,1.09586,0,0,0.690393,0,0.164379,1.10155,-6.43376,0,0,1.09586,2.28571,0.192256,0.128171,0.106809,
0,0,0,0,0,0,0,0,0,0,-0.00471794,0.0014969,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0.0190116,-0.0402984,0.0212869,0,0,0,0,
0,0,0.0426379,0,0,0.0268619,0,0.00639568,0,0.0639568,0,0.0170295,-0.334747,0.177866,0.00748033,0.00498689,0.00415574,
0,0,0.269588,0,0,0.606573,0,0.144422,0,1.44422,0,0,1.92563,-4.39043,0.0385126,0.15405,0.192563,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,-1,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-1,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-1), 17))

stan_code <- "
functions  {
  // Discretize Linear Time-Invariant ODE with Gaussian Noise
  matrix lti_disc(int m, matrix Ac, matrix Qc, real dt) {
    int m1 = m+1;
    int m2 = m*2;
    matrix[m2, m2] phi;
    matrix[m2, m] OI;
    matrix[m2, m] AB;
    matrix[m2, m] AB2;
    matrix[m, m] Q;

    OI[1:m, 1:m] = rep_matrix(0., m, m);
    OI[m1:m2, 1:m] = diag_matrix(rep_vector(1., m));

    phi[1:m, 1:m] = Ac;
    phi[1:m, m1:m2] = Qc;
    phi[m1:m2, 1:m] = rep_matrix(0., m, m);
    phi[m1:m2, m1:m2] = -Ac';

    AB = matrix_exp(phi*dt) * OI;
    AB2 = scale_matrix_exp_multiply(dt, phi, OI);

    // return the difference between the two methods. 
    return AB - AB2;

    // The actual return code 
    //Q = AB[1:m, ] / AB[m1:m2, ];
    //return Q;
  }
}
parameters {}
model {}"

expose_stan_functions(stanc(model_code = stan_code))

# should be close to zero
sum(lti_disc(17, Ac, Qc=diag(c(rep(0.1, 17))), dt=0.1))  # is close to zero
sum(lti_disc(17, Ac, Qc=diag(c(rep(0.1, 17))), dt=1))  # far from zero

Setting the dt = 0.1 when calling the ´lti_disc´ function returns an expected result. Somewhere at dt > 0.15 the scale_matrix_exp_multiply fails to return similar result as matrix_exp(tA) * B.

For feature requests:

if possible, let the user set the precision of scale_matrix_exp_multiply
make it clear in the documentation that scale_matrix_exp_multiply(t, A, B) is an approximation of matrix_exp(tA) * B.

syclik commented 5 years ago

@lukas-rokka, thank you for submitting the issue! We’ll try to track it down.

The next steps:

create a unit test showing this behavior (the example gets us almost all the way there)
determine what the problem is
fix it.

@yizhang-cae, want to take a look?

lukas-rokka commented 5 years ago

I did a benchmark, with the example above, using windows, rstan 2.18.2 and the microbenchmark package. For this particular setup, matrix_exp(tA) * B is actually 2-3 times faster than scale_matrix_exp_multiply(t, A, B).

wds15 commented 5 years ago

How do you run the benchmark? In case you use expose_stan_functions then you do not calculate the gradients of that expression ... and that is the real bummer for a Stan program.

I just would like to note that we have disabled the scale_matrox_exp_multiply function in the current develop (and replaced it with the non-special code). This has been done has very rare segfaults were reported on the specialized function. As far as I recall @yizhang-cae hasn't yet found the time to address this. So develop of Stan will make no difference between things.

lukas-rokka commented 5 years ago

Yes, I used the expose_stan_functions. So the benchmark was between the implementation of the matrix_exp(tA) * B and scale_matrix_exp_multiply(t, A, B), not for a full Stan program.

Did some further benchmarking. The scale_matrix_exp_multiply was actually 30-50 % faster when the resulting system has a condition number closer to 1. So the slowdown is probable related to precision as well.

Personally, I'm happy using the matrix_exp(tA) * B and I fully understand that you might not address this issue in future releases. But the text in the function documentation, "algebraically equivalent to the less eﬃcient form matrix_exp(tA) * B", is a bit misleading with current implementation.

And a big thank you to you all for developing this great tool!

lukas-rokka commented 5 years ago

or, do you mean that scale_matrox_exp_multiply can be more efficient in an actual Stan program due to how gradients are calculated?

wds15 commented 5 years ago

Yes... it can... but I haven't done the implementation myself so that I don't know... but certainly the integrated function does some tricks to get the gradients more efficient - and the numerical cost of almost any Stan operation is getting the gradients (while the actual function value is most of the time relatively cheap to get in comparison).

alekpikl commented 5 years ago

Hi,

Firstly, a bit thank you to all developers who are contributing to this great project!

Coming back to the topic at hand, I just wanted to create an issue about this... I had the same problem with the underlying method matrix_exp_action_handler().action(), which is called by scale_matrix_exp_multiply().

To the best of my knowledge, the matrix_exp_action_handler().action() is causing the trouble and commenting out lines 82-83 in the stan/math/prim/mat/fun/matrix_exp_action_handler.hpp solves the issue. The method terminates too early due to this error in the implementation before converging to the right solution.

I am not 100% sure if this is the correct fix, but in my test cases (arbitrary random matrices, some poisson equasion systems and a Maxwell-Bloch Numerical solver) it seems to provide correct results. Also, after comparing it to the original Matlab implementation by the authors of the method, it seems right.

Should I open a new issue for the matrix_exp_action_handler().action() function or will it be fixed based on this one? I am new to this... I have almost posted a whole new issue before stumbling upon this one, so I have a description, a minimal working example and some expected outputs prepared. :) @lukas-rokka @wds15 @syclik

yizhang-yiz commented 5 years ago

Sorry being stranded on something else. I'll take a look and address this.

yizhang-yiz commented 2 years ago

Sorry for the long~~ delay. I was finally able to fix this in #2529. Can someone take a look? I believe this is the same issue that @chvandorp is looking at in #2529.

Note that currently the algorithm only applies to prim. In rev mode it's more involved to get the matrix derivative right. Currently it's just a wrapper around matrix_exp and multiply.

stan-dev / math

precision of scale_matrix_exp_multiply #1146

Description

Example