MSM.mfpt crashes with LinAlgError in Trypsin_Benzamidine_HMM notebook

marscher commented 8 years ago

During testing the trypsin_benzamidine_hmm notebook I've encounter the following problem. For lag time=1, the evaluation of kinetic/thermodynamic quantities the mfpt method fails for matrix A being singular. However T is fully connected and reversible.

@nsplattner suggested that the new bhmm version gives a converged estimate, which is in this case not converged because we only feed in a very small data set.

We could handle the exception in the notebook, but I'd prefer a handler in the MSM.mfpt function (just to give an informative message, when we know how this is caused).

In the notebook cell 22:

#dG_stats = np.array([binding_free_energy(M) for M in its.models])
#kon_stats = np.array([binding_rate(M) for M in its.models])
#koff_stats = np.array([unbinding_rate(M) for M in its.models])
#print M
for M in its.models:
    print M.is_reversible
    print M.transition_matrix
    unbinding_rate(M)

#### output:
True
[[  9.90278363e-01   2.35753316e-37   4.67501507e-03   0.00000000e+00
    5.04662171e-03]
 [  3.19593056e-37   9.93143278e-01   6.85672226e-03   0.00000000e+00
    3.81226426e-29]
 [  1.17518596e-03   1.27145389e-03   9.93183749e-01   3.43529433e-04
    4.02608166e-03]
 [  0.00000000e+00   0.00000000e+00   0.00000000e+00   1.00000000e+00
    0.00000000e+00]
 [  3.80537587e-03   2.12050979e-29   1.20769094e-02   8.94882804e-04
    9.83222832e-01]]
bound state 3
unbound state 2
mfpt: A [[  9.72163678e-03  -2.35753316e-37  -4.67501507e-03   0.00000000e+00
   -5.04662171e-03]
 [ -3.19593056e-37   6.85672226e-03  -6.85672226e-03   0.00000000e+00
   -3.81226426e-29]
 [  0.00000000e+00   0.00000000e+00   1.00000000e+00   0.00000000e+00
    0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
    0.00000000e+00]
 [ -3.80537587e-03  -2.12050979e-29  -1.20769094e-02  -8.94882804e-04
    1.67771681e-02]]

---------------------------------------------------------------------------
LinAlgError                               Traceback (most recent call last)
<ipython-input-33-34deb81dd4bf> in <module>()
      6     print M.is_reversible
      7     print M.transition_matrix
----> 8     unbinding_rate(M)

<ipython-input-20-95599723ec09> in unbinding_rate(bhmm, conf)
     49     i_unbound, i_bound = index_unbound_bound(bhmm)
     50     # MLE
---> 51     mfpt_mle = bhmm.mfpt(i_bound, i_unbound)
     52     koff = mfpt2koff(mfpt_mle)
     53     # samples

/home/marscher/anaconda/envs/clean_ipy/lib/python2.7/site-packages/pyEMMA-2.1.1+89.ga553b9a-py2.7-linux-x86_64.egg/pyemma/msm/models/msm.pyc in mfpt(self, A, B)
    445             set of target states
    446         """
--> 447         return self._mfpt(self.transition_matrix, A, B, mu=self.stationary_distribution)
    448 
    449     def _committor_forward(self, P, A, B):

/home/marscher/anaconda/envs/clean_ipy/lib/python2.7/site-packages/pyEMMA-2.1.1+89.ga553b9a-py2.7-linux-x86_64.egg/pyemma/msm/models/msm.pyc in _mfpt(self, P, A, B, mu)
    433         from msmtools.analysis import mfpt as __mfpt
    434         # scale mfpt by lag time
--> 435         return self._timeunit_model.dt * __mfpt(P, B, origin=A, mu=mu)
    436 
    437     def mfpt(self, A, B):

/home/marscher/workspace/msmtools/msmtools/analysis/api.pyc in mfpt(T, target, origin, tau, mu)
    714             t_tau = dense.mean_first_passage_time.mfpt(T, target)
    715         else:
--> 716             t_tau = dense.mean_first_passage_time.mfpt_between_sets(T, target, origin, mu=mu)
    717 
    718     # scale answer by lag time used.

/home/marscher/workspace/msmtools/msmtools/analysis/dense/mean_first_passage_time.py in mfpt_between_sets(T, target, origin, mu)
    128 
    129     """Mean first-passage time to Y (for all possible starting states)"""
--> 130     tY = mfpt(T, target)
    131 
    132     """Mean first-passage time from X to Y"""

/home/marscher/workspace/msmtools/msmtools/analysis/dense/mean_first_passage_time.py in mfpt(T, target)
     87     b = np.ones(dim)
     88     b[target] = 0.0
---> 89     m_t = solve(A, b)
     90     return m_t
     91 

/home/marscher/anaconda/envs/clean_ipy/lib/python2.7/site-packages/scipy/linalg/basic.pyc in solve(a, b, sym_pos, lower, overwrite_a, overwrite_b, debug, check_finite)
    101         return x
    102     if info > 0:
--> 103         raise LinAlgError("singular matrix")
    104     raise ValueError('illegal value in %d-th argument of internal gesv|posv' %
    105                      -info)

LinAlgError: singular matrix

marscher commented 8 years ago

For all lagtimes the off rates can not be computed:

koff_stats = []
from scipy.linalg import LinAlgError
for M in its.models:
    try:
        koff_stats.append(unbinding_rate(M))
    except LinAlgError:
        koff_stats.append(None)
print koff_stats
[None, None, (inf, inf, inf), (inf, inf, inf), (inf, inf, inf), None, None]

franknoe commented 8 years ago

T is clearly not fully connected. State 4 is absorbing.

I guess that means that this subset of data is actually disconnected. The new HMM code provides much better convergence to the optimal solution, so it reveals a problem that hasn't been seen before. I think we have a larger set of trajectories (at least 20 microseconds), we could try with those.

Since this is an issue with the data in the notebook rather than with the code, I would suggest to disable this computation of dissociation times and binding affinities for the purpose of this micro-release (it just affects the docs). We can find a more satisfactory solution for 2.2.

Please let me know when the notebooks output has been updated, I'm happy to check them.

Am 27/04/16 um 17:03 schrieb Martin K. Scherer:

During testing the trypsin_benzamidine_hmm notebook I've encounter the following problem. For lag time=1, the evaluation of kinetic/thermodynamic quantities the mfpt method fails for matrix A being singular. However T is fully connected and reversible.

@nsplattner https://github.com/nsplattner suggested that the new bhmm version gives a converged estimate, which is in this case not converged because we only feed in a very small data set.

We could handle the exception in the notebook, but I'd prefer a handler in the MSM.mfpt function (just to give an informative message, when we know how this is caused).

In the notebook cell 22:

dG_stats = np.array([binding_free_energy(M) for M in its.models])

kon_stats = np.array([binding_rate(M) for M in its.models])

koff_stats = np.array([unbinding_rate(M) for M in its.models])

print M

for M in its.models: print M.is_reversible print M.transition_matrix unbinding_rate(M)

output:

True [[9.90278363e-01 2.35753316e-37 4.67501507e-03 0.00000000e+00 5.04662171e-03] [3.19593056e-37 9.93143278e-01 6.85672226e-03 0.00000000e+00 3.81226426e-29] [1.17518596e-03 1.27145389e-03 9.93183749e-01 3.43529433e-04 4.02608166e-03] [0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00 0.00000000e+00] [3.80537587e-03 2.12050979e-29 1.20769094e-02 8.94882804e-04 9.83222832e-01]] bound state3 unbound state2 mfpt:A [[9.72163678e-03 -2.35753316e-37 -4.67501507e-03 0.00000000e+00 -5.04662171e-03] [-3.19593056e-37 6.85672226e-03 -6.85672226e-03 0.00000000e+00 -3.81226426e-29] [0.00000000e+00 0.00000000e+00 1.00000000e+00 0.00000000e+00 0.00000000e+00] [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00] [-3.80537587e-03 -2.12050979e-29 -1.20769094e-02 -8.94882804e-04 1.67771681e-02]]

LinAlgError Traceback (most recent call last)
in () 6 print M.is_reversible 7 print M.transition_matrix ----> 8 unbinding_rate(M) in unbinding_rate(bhmm, conf) 49 i_unbound, i_bound= index_unbound_bound(bhmm) 50 # MLE ---> 51 mfpt_mle= bhmm.mfpt(i_bound, i_unbound) 52 koff= mfpt2koff(mfpt_mle) 53 # samples /home/marscher/anaconda/envs/clean_ipy/lib/python2.7/site-packages/pyEMMA-2.1.1+89.ga553b9a-py2.7-linux-x86_64.egg/pyemma/msm/models/msm.pycin mfpt(self,A,B) 445 set of target states 446 """ --> 447 return self._mfpt(self.transition_matrix, A, B, mu=self.stationary_distribution) 448 449 def _committor_forward(self, P, A, B): /home/marscher/anaconda/envs/clean_ipy/lib/python2.7/site-packages/pyEMMA-2.1.1+89.ga553b9a-py2.7-linux-x86_64.egg/pyemma/msm/models/msm.pyc in _mfpt(self, P, A, B, mu) 433 from msmtools.analysis import mfpt as __mfpt 434 # scale mfpt by lag time --> 435 return self._timeunit_model.dt \* __mfpt(P, B, origin=A, mu=mu) 436 437 def mfpt(self, A, B): /home/marscher/workspace/msmtools/msmtools/analysis/api.pyc in mfpt(T, target, origin, tau, mu) 714 t_tau = dense.mean_first_passage_time.mfpt(T, target) 715 else: --> 716 t_tau = dense.mean_first_passage_time.mfpt_between_sets(T, target, origin, mu=mu) 717 718 # scale answer by lag time used. /home/marscher/workspace/msmtools/msmtools/analysis/dense/mean_first_passage_time.py in mfpt_between_sets(T, target, origin, mu) 128 129 """Mean first-passage time to Y (for all possible starting states)""" --> 130 tY = mfpt(T, target) 131 132 """Mean first-passage timefrom X toY""" /home/marscher/workspace/msmtools/msmtools/analysis/dense/mean_first_passage_time.py in mfpt(T, target) 87 b = np.ones(dim) 88 b[target] = 0.0 ---> 89 m_t = solve(A, b) 90 return m_t 91 /home/marscher/anaconda/envs/clean_ipy/lib/python2.7/site-packages/scipy/linalg/basic.pyc in solve(a, b, sym_pos, lower, overwrite_a, overwrite_b, debug, check_finite) 101 return x 102 if info > 0: --> 103 raise LinAlgError("singular matrix") 104 raise ValueError('illegal value in %d-th argument of internal gesv|posv' % 105 -info) LinAlgError: singular matrix — You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/markovmodel/PyEMMA/issues/786

Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

marscher commented 8 years ago

I've updated the notebooks, they are currently being executed in the jenkins job.

markovmodel / PyEMMA

MSM.mfpt crashes with LinAlgError in Trypsin_Benzamidine_HMM notebook #786

dG_stats = np.array([binding_free_energy(M) for M in its.models])

kon_stats = np.array([binding_rate(M) for M in its.models])

koff_stats = np.array([unbinding_rate(M) for M in its.models])

print M

output:

Mail: Arnimallee 6, 14195 Berlin, Germany