markovmodel / PyEMMA

🚂 Python API for Emma's Markov Model Algorithms 🚂
http://pyemma.org
GNU Lesser General Public License v3.0
306 stars 119 forks source link

Possibility of specifying the tolerance for the MSM class #1578

Closed wehs7661 closed 1 year ago

wehs7661 commented 1 year ago

Hi, I am a beginner using PyEMMA and I was using PyEMMA to analyze a transition matrix directly parsed from a GROMACS expanded ensemble simulation. When instantiating the class MSM in pyemma.msm with the transition matrix (which was a numpy array), I got the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/Wei-TseHsu/gmxapi_21.4/lib/python3.9/site-packages/pyemma/msm/models/msm.py", line 96, in __init__
    self.set_model_params(P=P, pi=pi, reversible=reversible, dt_model=dt_model, neig=neig)
  File "/Users/Wei-TseHsu/gmxapi_21.4/lib/python3.9/site-packages/pyemma/msm/models/msm.py", line 148, in set_model_params
    self.update_model_params(P=P)
  File "/Users/Wei-TseHsu/gmxapi_21.4/lib/python3.9/site-packages/pyemma/_base/model.py", line 79, in update_model_params
    setattr(self, key, value)  # set parameter for the first time.
  File "/Users/Wei-TseHsu/gmxapi_21.4/lib/python3.9/site-packages/pyemma/msm/models/msm.py", line 186, in P
    raise ValueError('T is not a transition matrix.')

To my understanding, the issue occurred because the sum of each row in my input transition matrix was very close but not exactly 1. Specifically, I've tried to use the function is_transition_matrix in deeptime.markov.tools.analysis with varying tolerances (specified by tol) and is_transition_matrix(A, tol=1e-7) returned true. However, when is_transition_matrix was used during the instantiation of MSM, the tolerance was specified as 1e-8, so the error occurred. (See Line 185 in msm.py.) While I temporarily solved the problem by installing the package from the source with the developer option and modifying the source code, I'm wondering if it's worth adding an optional parameter in MSM so that the user can specify the tolerance for telling if a matrix is a transition matrix.

By the way, I was using PyEMMA 2.5.12 on MacOS Catalina 10.15.7. A text file with the output of pip list has been attached: pip_list.txt

wehs7661 commented 1 year ago

Actually, I later encountered another similar problem after specifying a larger tolerance in Line 185 mentioned above. Specifically, the input matrix was not recognized as a transition matrix by is_reversible in deeptime. I'm wondering if I should tweak my transition matrix so that I can use deeptime and PyEMMA to analyze it. If so, what strategies would be generally good to tweak a transition matrix? Thanks in advance!

clonker commented 1 year ago

Hi! I would recommend using deeptime directly without PyEMMA as a proxy. Eventually this PyEMMA functionality is going to be removed and we migrate to deeptime entirely. In deeptime you do have the option to specify a tolerance for transition matrices, see here: https://deeptime-ml.github.io/latest/api/generated/deeptime.markov.msm.MarkovStateModel.html

Otherwise your quickfix is not a bad option either and if you are willing to make a PR that introduces this option to PyEMMA then I am happy to review and merge it! Tweaking the matrix does not seem like a good idea unless you are absolutely sure about the implications.

Cheers, Moritz.

wehs7661 commented 1 year ago

Hi Moritz, Thanks a lot for the quick reply (and sorry I was unable to get back to you sooner)! I have just tried to modify the source code of msm.py in PyEMMA. However, since PyEMMA heavily relies on deeptime, I might also need to modify some source code in deeptime. Specifically, msm.py uses is_transition_matrix and is_reversible in deeptime, so I added an optional argument of tol to both Line 185 and Line 191. is_reversible in Line 191 then uses stationary_distribution, without specifying a tolerance. Therefore, to make the optional argument in msm.py in PyEMMA fully functional, I'll also modify _stationary_vector.py and _assessment.py in deeptime.

Also, I assume that markov_model is the only user function that takes in an input transition matrix and therefore needs modification. Please let me know if I miss any other classes that need such a modification, then I'll make changes for all of them. Thanks a lot!

clonker commented 1 year ago

Hi there,

this sounds great! I am happy to review the changes and merge them. I think it is also fine to gradually improve these things, so even if we miss a place where the tolerance would factor in, it is not that big of a deal and it can still be updated later on.

I am looking forward to your PRs :slightly_smiling_face:

Thanks, Moritz

wehs7661 commented 1 year ago

Great! I have created pull requests for both packages!