Open Marius1311 opened 4 years ago
@Marius1311 I see your point - it's not surprising (I already thought that I am constructing some overhead there with Krylov-Schur when I wrote it, but I postponed it to later optimizations... so this should be fixed.).
The problem with fmin
is also known: it get's much worse, if you try it with lets say 60 states... since Nelder-Mead is a derivative-free method and thus becomes extremely slow with increasing dimensions.
I have a solution for this but it involves Gauss-Newton optimization and will need a lot of fiddling and testing to port it from Matlab (there it works fine) to Python.
I would prefer to address this sometimes in the future - after merging into the original msmmodels repo.
Regarding the _fill_matrix
function I guess I can try some optimizations... let's see :)
Thanks! I think long-term, Gauss-Newton is a good perspective, short/medium term, it would be fantastic if we could just speed up _fill_matrix
a bit.
I just run GPCCA on a sparse, 32k x 32k matrix on my laptop, see profiling output for 6 states in Fig 1 and for 21 states in Fig 2 below. The runtimes are 45s for 6 states and 413s (7 min) for 21 states. One nice property is that krylov scales very well in the number of states: it takes around 18s for 6 states and around 30s for 21 states, so a high number of states is never going to be a scalability issue for krylov.
Bottlenecks
For 6 states, there is no obvious bottleneck. Time seems to divide relatively even between the main steps of the algorithm.
For 21 states, there is a clear bottleneck: the optimization (
fmin
), and within that, the_fill_matrix
function.Easy optimizations
In the 6 state case, we can easily half the computation time for krylov if we don't compute the eigenvalues twice, once for checking whether we are splitting a block of comp. conj and once for the actual decomposition. Instead, we could just by default compute m+1 vectors, check whether we split a block of complex conjugates and if we do, warn the user and continue with m+1 vectors, otherwise with just m. We can save another 4s by not computing the stationary distribution if the user didn't request it. These two changes would amount to 13s or almost one third of the total computation time.
In the 21 state case, it may be possible to speed up the
_fill_matrix
function, but I'm not very familiar with it. @msmdev , can you think of possibilities to speed this up?Fig 1
6 states
Fig 2
21 states