Open lukas-rokka opened 5 years ago
@lukas-rokka, thank you for submitting the issue! We’ll try to track it down.
The next steps:
@yizhang-cae, want to take a look?
I did a benchmark, with the example above, using windows, rstan 2.18.2 and the microbenchmark package. For this particular setup, matrix_exp(tA) * B
is actually 2-3 times faster than scale_matrix_exp_multiply(t, A, B).
How do you run the benchmark? In case you use expose_stan_functions
then you do not calculate the gradients of that expression ... and that is the real bummer for a Stan program.
I just would like to note that we have disabled the scale_matrox_exp_multiply
function in the current develop (and replaced it with the non-special code). This has been done has very rare segfaults were reported on the specialized function. As far as I recall @yizhang-cae hasn't yet found the time to address this. So develop of Stan will make no difference between things.
Yes, I used the expose_stan_functions
. So the benchmark was between the implementation of the matrix_exp(tA) * B
and scale_matrix_exp_multiply(t, A, B)
, not for a full Stan program.
Did some further benchmarking. The scale_matrix_exp_multiply
was actually 30-50 % faster when the resulting system has a condition number closer to 1. So the slowdown is probable related to precision as well.
Personally, I'm happy using the matrix_exp(tA) * B
and I fully understand that you might not address this issue in future releases. But the text in the function documentation, "algebraically equivalent to the less efficient form matrix_exp(tA) * B
", is a bit misleading with current implementation.
And a big thank you to you all for developing this great tool!
or, do you mean that scale_matrox_exp_multiply
can be more efficient in an actual Stan program due to how gradients are calculated?
Yes... it can... but I haven't done the implementation myself so that I don't know... but certainly the integrated function does some tricks to get the gradients more efficient - and the numerical cost of almost any Stan operation is getting the gradients (while the actual function value is most of the time relatively cheap to get in comparison).
Hi,
Firstly, a bit thank you to all developers who are contributing to this great project!
Coming back to the topic at hand, I just wanted to create an issue about this...
I had the same problem with the underlying method matrix_exp_action_handler().action()
, which is called by scale_matrix_exp_multiply()
.
To the best of my knowledge, the matrix_exp_action_handler().action()
is causing the trouble and commenting out lines 82-83 in the stan/math/prim/mat/fun/matrix_exp_action_handler.hpp
solves the issue. The method terminates too early due to this error in the implementation before converging to the right solution.
I am not 100% sure if this is the correct fix, but in my test cases (arbitrary random matrices, some poisson equasion systems and a Maxwell-Bloch Numerical solver) it seems to provide correct results. Also, after comparing it to the original Matlab implementation by the authors of the method, it seems right.
Should I open a new issue for the matrix_exp_action_handler().action()
function or will it be fixed based on this one? I am new to this...
I have almost posted a whole new issue before stumbling upon this one, so I have a description, a minimal working example and some expected outputs prepared. :)
@lukas-rokka @wds15 @syclik
Sorry being stranded on something else. I'll take a look and address this.
Sorry for the long~~ delay. I was finally able to fix this in #2529. Can someone take a look? I believe this is the same issue that @chvandorp is looking at in #2529.
Note that currently the algorithm only applies to prim
. In rev
mode it's more involved to get the matrix derivative right. Currently it's just a wrapper around matrix_exp
and multiply
.
Description
scale_matrix_exp_multiply(t, A, B)
doesn't always equalmatrix_exp(tA) * B
. Might be a precision issue, wherescale_matrix_exp_multiply
does some sort of approximation and therefore doesn't give same result asmatrix_exp(tA) * B
whenA
has a low condition number.Example
This example is based on the discretization of Linear Time-Invariant ODE systems (https://github.com/eddelbuettel/rcppkalman/blob/master/src/ltidisc.cpp).
Setting the
dt = 0.1
when calling the ´lti_disc´ function returns an expected result. Somewhere at dt > 0.15 thescale_matrix_exp_multiply
fails to return similar result asmatrix_exp(tA) * B
.For feature requests:
scale_matrix_exp_multiply
scale_matrix_exp_multiply(t, A, B)
is an approximation ofmatrix_exp(tA) * B
.