dense transition matrix estimation is SLOW

franknoe commented 8 years ago

I'm not sure why we haven't implemented the dense reversible transition matrix in C yet (it's still using my old code). Perhaps it's usually not used because most MSM transition matrices are sparse, or perhaps because the numpy impl is not too bad when the matrix is large enough. Anyway, here's a situation where our numpy code is unacceptably slow, so we should really replace it by a C impl as soon as possible.

In HMM estimation you have small and dense count matrices, and you need to repeat the transition matrix estimation step many times (for every EM iteration). Have a look at this count matrix:

C = np.array([[  5.76299276e+03,   2.30971393e+01,   1.35014478e+01,   1.38196286e+01,    4.00599140e+00],
              [  2.97761114e+01,   1.37491059e+03,   1.41871744e+00,   9.26348162e+00,    3.97745659e+00],
              [  6.14488700e+00,   1.37336016e-01,   6.22552515e+02,   2.11491958e+00,    8.25797984e-03],
              [  8.79300883e+00,   7.24164262e+00,   4.83499306e-01,   1.53880796e+03,    8.48679238e-03],
              [  2.20553755e-04,   1.00618203e-03,   3.97325911e-06,   5.52592952e-05,    4.76942872e+02]])

Calling msmtools.estimation.transition_matrix defaults to the dense implementation, so I am calling the methods explicitly to show the difference (and also set convergence parameters explicitly, because they actually have different defaults in sparse and dense - but this is a separate issue). Dense (default) impl:

%timeit P = msmest.transition_matrix(C, reversible=True, method='dense', maxerr=1e-8, maxiter=1000000)
1 loops, best of 3: 19.7 s per loop

Sparse code:

%timeit P = msmest.transition_matrix(C, reversible=True, method='sparse', maxerr=1e-8, maxiter=1000000)
10 loops, best of 3: 178 ms per loop

So the sparse implementation is 100 times faster! But this matrix is actually dense, so this is just because the C implementation is a factor 100 times faster here. A dense C implementation will be even faster, because we save additional overhead by not using sparse indexing. So we should replace that python code by a dense C impl.

To see how dramatic this is: in HMMs you could have 100 or more EM iterations, in each of which such a ML problem is solved. So we are talking about a difference of 20 seconds or half an hour for one estimation step. And that's for a single lag time.

@trendelkampschroer or @fabian-paul , either of you please assign yourself and take care of this as soon as possible. Thanks!

fabian-paul commented 8 years ago

We should check first whether the convergence criteria are actually the same in the two implementations.

franknoe commented 8 years ago

This was enforcing the same convergence criteria

Am 17/12/15 um 11:19 schrieb fabian-paul:

We should check first whether the convergence criteria are actually the same in the two implementations.

— Reply to this email directly or view it on GitHub https://github.com/markovmodel/msmtools/issues/71#issuecomment-165408730.

Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

fabian-paul commented 8 years ago

With the unified metrics it's 12.6 s per loop vs. 746 ms per loop.

trendelkampschroer commented 8 years ago

I am not sure if we should use relative errors to check convergence of pi, that seems to be very restrictive, but you have a point in that checking relative errors of pi amounts to checking absolute errors on free energies we have to decide.

The iterative algorithm works solely with probabilities \pi or rather non-normalized weights x, checking the convergence of \log(pi)/\log(x) is a bit unintuitive from an algorithmic point of view.

Am I correct in understanding that the algorithm takes a factor of ten if you check errors on the level of free energies?

fabian-paul commented 8 years ago

Ah no, that 12.6s vs. 746ms comparison is the dense python implementation versus the sparse C implementation. But we should check how the change of criterion influences the run-time in order to not to surprise the users.

fabian-paul commented 8 years ago

The new version of the algorithm works with the normalized x vector (which converges to the stationary vector). Well it isn't clear to me either that covergence can actually be achieved when some elements of x become very small. Perhaps from a physical point of view one could introduce a new parameter for the "smallest interesting" value of x? Maybe one is interested in coverging free energies around 10 kT but not free energies as small as 30 kT. On the other hand, an absolute error on the free energies already makes this possible (i.e. to ignore high free energies).

fabian-paul commented 8 years ago

Some empirical testing shows that new termination condition doesn't make things much slower. For dense matrices of sizes 5x5 to 300x300 and counts in the range of 1-1000, I see a 15% slowdown.

markovmodel / msmtools

dense transition matrix estimation is SLOW #71

Mail: Arnimallee 6, 14195 Berlin, Germany