Scaling - optimization potential

Marius1311 commented 4 years ago

I just run GPCCA on a sparse, 32k x 32k matrix on my laptop, see profiling output for 6 states in Fig 1 and for 21 states in Fig 2 below. The runtimes are 45s for 6 states and 413s (7 min) for 21 states. One nice property is that krylov scales very well in the number of states: it takes around 18s for 6 states and around 30s for 21 states, so a high number of states is never going to be a scalability issue for krylov.

Bottlenecks

For 6 states, there is no obvious bottleneck. Time seems to divide relatively even between the main steps of the algorithm.

For 21 states, there is a clear bottleneck: the optimization (fmin), and within that, the _fill_matrix function.

Easy optimizations

In the 6 state case, we can easily half the computation time for krylov if we don't compute the eigenvalues twice, once for checking whether we are splitting a block of comp. conj and once for the actual decomposition. Instead, we could just by default compute m+1 vectors, check whether we split a block of complex conjugates and if we do, warn the user and continue with m+1 vectors, otherwise with just m. We can save another 4s by not computing the stationary distribution if the user didn't request it. These two changes would amount to 13s or almost one third of the total computation time.

In the 21 state case, it may be possible to speed up the _fill_matrix function, but I'm not very familiar with it. @msmdev , can you think of possibilities to speed this up?

Fig 1

6 states Screenshot 2020-05-29 at 19 19 25

Fig 2

21 states Screenshot 2020-05-29 at 19 31 49

msmdev commented 4 years ago

@Marius1311 I see your point - it's not surprising (I already thought that I am constructing some overhead there with Krylov-Schur when I wrote it, but I postponed it to later optimizations... so this should be fixed.).

The problem with fmin is also known: it get's much worse, if you try it with lets say 60 states... since Nelder-Mead is a derivative-free method and thus becomes extremely slow with increasing dimensions. I have a solution for this but it involves Gauss-Newton optimization and will need a lot of fiddling and testing to port it from Matlab (there it works fine) to Python. I would prefer to address this sometimes in the future - after merging into the original msmmodels repo.

Regarding the _fill_matrix function I guess I can try some optimizations... let's see :)

Marius1311 commented 4 years ago

Thanks! I think long-term, Gauss-Newton is a good perspective, short/medium term, it would be fantastic if we could just speed up _fill_matrix a bit.

msmdev / msmtools