markovmodel / PyEMMA

🚂 Python API for Emma's Markov Model Algorithms 🚂
http://pyemma.org
GNU Lesser General Public License v3.0
307 stars 118 forks source link

Bug pcca and cktest for msm built using statdist option #1454

Closed pl992 closed 4 years ago

pl992 commented 4 years ago

Evaluating the pcca of an msm object built including the statdist option returns the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~/miniconda3/lib/python3.7/site-packages/pyemma/msm/models/msm.py in pcca(self, m)
    943             # this will except if we don't have a pcca object
--> 944             if self._pcca.n_metastable != m or self._pcca.P is not self.P:
    945                 # incorrect number of states or new transition matrix -> recompute

AttributeError: 'MaximumLikelihoodMSM' object has no attribute '_pcca'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-43-b1303c504ba0> in <module>
----> 1 msm.pcca(4)

~/miniconda3/lib/python3.7/site-packages/pyemma/msm/models/msm.py in pcca(self, m)
    947         except AttributeError:
    948             # didn't have a pcca object yet - compute
--> 949             self._pcca = PCCA(self.P, m)
    950 
    951         # set metastable properties

~/miniconda3/lib/python3.7/site-packages/pyemma/msm/models/pcca.py in __init__(self, P, m)
     34     """
     35     def __init__(self, P, m):
---> 36         self.set_model_params(P, m)
     37 
     38     def set_model_params(self, P, m):

~/miniconda3/lib/python3.7/site-packages/pyemma/msm/models/pcca.py in set_model_params(self, P, m)
     60         # TODO: can be improved. pcca computes stationary distribution internally, we don't need to compute it twice.
     61         from msmtools.analysis.dense.pcca import pcca
---> 62         self._M = pcca(P, m)
     63 
     64         # stationary distribution

~/miniconda3/lib/python3.7/site-packages/msmtools/analysis/dense/pcca.py in pcca(P, m)
    359                          " exceeds number of states of transition matrix n = " + str(n))
    360     if not is_transition_matrix(P):
--> 361         raise ValueError("Input matrix is not a transition matrix.")
    362 
    363     # prepare output

ValueError: Input matrix is not a transition matrix.

Applying the cktest to a msm object built including the statdist option returns the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~/miniconda3/lib/python3.7/site-packages/pyemma/msm/models/msm.py in pcca(self, m)
    943             # this will except if we don't have a pcca object
--> 944             if self._pcca.n_metastable != m or self._pcca.P is not self.P:
    945                 # incorrect number of states or new transition matrix -> recompute

AttributeError: 'MaximumLikelihoodMSM' object has no attribute '_pcca'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-37-a2c880b9b518> in <module>
----> 1 msm.cktest(4)

~/miniconda3/lib/python3.7/site-packages/pyemma/msm/estimators/_msm_estimator_base.py in cktest(self, nsets, memberships, mlags, conf, err_est, n_jobs, show_progress)
    943         from pyemma.msm.estimators import ChapmanKolmogorovValidator
    944         if memberships is None:
--> 945             self.pcca(nsets)
    946             memberships = self.metastable_memberships
    947         ck = ChapmanKolmogorovValidator(self, self, memberships, mlags=mlags, conf=conf,

~/miniconda3/lib/python3.7/site-packages/pyemma/msm/models/msm.py in pcca(self, m)
    947         except AttributeError:
    948             # didn't have a pcca object yet - compute
--> 949             self._pcca = PCCA(self.P, m)
    950 
    951         # set metastable properties

~/miniconda3/lib/python3.7/site-packages/pyemma/msm/models/pcca.py in __init__(self, P, m)
     34     """
     35     def __init__(self, P, m):
---> 36         self.set_model_params(P, m)
     37 
     38     def set_model_params(self, P, m):

~/miniconda3/lib/python3.7/site-packages/pyemma/msm/models/pcca.py in set_model_params(self, P, m)
     60         # TODO: can be improved. pcca computes stationary distribution internally, we don't need to compute it twice.
     61         from msmtools.analysis.dense.pcca import pcca
---> 62         self._M = pcca(P, m)
     63 
     64         # stationary distribution

~/miniconda3/lib/python3.7/site-packages/msmtools/analysis/dense/pcca.py in pcca(P, m)
    359                          " exceeds number of states of transition matrix n = " + str(n))
    360     if not is_transition_matrix(P):
--> 361         raise ValueError("Input matrix is not a transition matrix.")
    362 
    363     # prepare output

ValueError: Input matrix is not a transition matrix.

Conda environment: conda_env.txt

Operating system: Distributor ID: Debian Description: Debian GNU/Linux 9.12 (stretch) Release: 9.12 Codename: stretch

Example:

msm = pyemma.msm.estimate_markov_model(dtrajs,lag=5,statdist=distr)
msm.pcca(4)
#or
msm.cktest(4)
thempel commented 4 years ago

Sorry for the delay. I cannot reproduce the error. I used this code and it runs smoothly:

import pyemma
import numpy as np

dtraj = np.random.randint(5, size=200)
statdist = np.ones(5)/5

msm = pyemma.msm.estimate_markov_model(dtraj, lag=1, statdist=statdist)
msm.pcca(2)

Does this code work for you? Maybe there is something wrong with your installation of pyemma?

thempel commented 4 years ago

Okay, after discussing with @clonker I believe that the test case above is not sufficient to assess the problem. Apparently, the issue is related to the iterative procedure used for estimating the transition matrix given stationary distribution constraints. If that is not converged "enough", the error will be raised in subsequent steps. You can try to work around the problem for now by trying to find a better converged transition matrix using the maxerr keyword argument.

The problem's cause is that the max error defined during estimation of statdist constraint MSMs is lower than in other parts of PyEMMA or msmtools. These other parts are not accessible by keyword arguments from the user API and span multiple packages, so it'll take some time to properly fix this.

pl992 commented 4 years ago

Sorry for the delay. I cannot reproduce the error. I used this code and it runs smoothly:

import pyemma
import numpy as np

dtraj = np.random.randint(5, size=200)
statdist = np.ones(5)/5

msm = pyemma.msm.estimate_markov_model(dtraj, lag=1, statdist=statdist)
msm.pcca(2)

Does this code work for you? Maybe there is something wrong with your installation of pyemma?

This code works fine.

Okay, after discussing with @clonker I believe that the test case above is not sufficient to assess the problem. Apparently, the issue is related to the iterative procedure used for estimating the transition matrix given stationary distribution constraints. If that is not converged "enough", the error will be raised in subsequent steps. You can try to work around the problem for now by trying to find a better converged transition matrix using the maxerr keyword argument.

The problem's cause is that the max error defined during estimation of statdist constraint MSMs is lower than in other parts of PyEMMA or msmtools. These other parts are not accessible by keyword arguments from the user API and span multiple packages, so it'll take some time to properly fix this.

Indeed, I've tried to decrease the maxerr argument and the problem is gone.

pl992 commented 4 years ago

Thank you!