markovmodel / PyEMMA

🚂 Python API for Emma's Markov Model Algorithms 🚂
http://pyemma.org
GNU Lesser General Public License v3.0
307 stars 118 forks source link

PyEMMA and GROMACS generate a bit different projection results #1525

Closed qasimpars closed 2 years ago

qasimpars commented 2 years ago

Hi,

I was doing PCA using PyEMMA-2.5.1 and GROMACS-2020.1.1 and compare their results. I just noticed that they give a bit different projection results. Is this a bug in PyEMMA?

Code used for PyEMMA:

%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib as mpl
import numpy as np
import mdshare
import pyemma
import pandas as pd
from pyemma.util.contexts import settings

pdb = 'ref.pdb'
files = []
files = ['traj.xtc']
print(files)
for i in range(len(files)):
    print(i, files[i])

feat = pyemma.coordinates.featurizer(pdb)
data = pyemma.coordinates.load(files, features=feat)
pca = pyemma.coordinates.pca(data, lag=1)
pca_output = pca.get_output()
print(pca_output)

Output:

#xaxis  label "projection on eigenvector 1 (nm)"
#yaxis  label "projection on eigenvector 2 (nm)"
-0.71755    0.61455
-0.89854    0.56703
-0.58625    0.38777
-0.49241    0.58259
...

GROMACS commands:

gmx covar -s ref.pdb -f traj.xtc
gmx anaeig -s ref.pdb -f traj.xtc -v eigenvec.trr -2d projs.xvg -first 1 -last 2

Output:

#xaxis  label "projection on eigenvector 1 (nm)"
#yaxis  label "projection on eigenvector 2 (nm)"
   0.71755   -0.61456
   0.89855   -0.56705
   0.58626   -0.38778
   0.49241   -0.58259
...
clonker commented 2 years ago

PCA is only unique up to the sign so this is what you are observing here. That put aside it is usually not a good idea to use PCA on time-series data. For projection purposes I recommend looking at TICA/VAMP/EDMD family of methods.