markovmodel / PyEMMA

🚂 Python API for Emma's Markov Model Algorithms 🚂
http://pyemma.org
GNU Lesser General Public License v3.0
307 stars 118 forks source link

Error while calculating VAMP2 score at different lag times #1535

Closed Bazzinga18 closed 2 years ago

Bazzinga18 commented 2 years ago

Hii, While VAMP2 score analysis for a particular feature by varying the dimension parameter for several lag times i got this error: RuntimeError: requested more output dimensions (2) than dimension of input data (1)

although total TICA dimension is 220

i used this particular code for the same:

def score_cv(data, dim, lag, number_of_splits=5, validation_fraction=0.5):
    with pyemma.util.contexts.settings(show_progress_bars=False):
        nval = int(len(data) * validation_fraction)
        scores = np.zeros(number_of_splits)
        for n in range(number_of_splits):
            ival = np.random.choice(len(data), size=nval, replace=False)
            vamp = pyemma.coordinates.vamp(
                [d for i, d in enumerate(data) if i not in ival], lag=lag, dim=dim)
            scores[n] = vamp.score([d for i, d in enumerate(data) if i in ival])
    return scores
lags = [1, 2, 5, 10, 20]
dims = [i + 1 for i in range(10)]

fig, ax = plt.subplots()
for i, lag in enumerate(lags):
    scores_ = np.array([score_cv(data, dim, lag)for dim in dims])
    scores = np.mean(scores_, axis=1)
    errors = np.std(scores_, axis=1, ddof=1)
    color = 'C{}'.format(i)
    ax.fill_between(dim, scores - errors, scores + errors, alpha=0.3, facecolor=color)
    ax.plot(dim, scores, '--o', color=color, label='lag={:.4f}ns'.format(lag * 0.4))
ax.legend()
ax.set_xlabel('number of dimensions')
ax.set_ylabel('VAMP2 score')
fig.tight_layout()

Please let me know as soon as possible;) thanks

clonker commented 2 years ago

Hi Bazzinga! To see what is going on we are going to need a bit more context. It seems to me though that you are giving it not the TICA-transformed data with 220 dimensions but something else. You can embed code fragments by embracing it with triple backticks like ``` by the way :slightly_smiling_face:

Bazzinga18 commented 2 years ago

hey thanks for your response. Let me show you the script from starting please tell me what's wrong:


feat.add_distances(indices=(1969,1979,1989,2004,2028,2043,2050,2066,2083,2102,2121,2143,2165,2179,2189,2204,2218,2237,2249), indices2=(2005,2029,2044,2051,2067,2084,2103,2122,2144,2166,2180,2190,2205,2219,2238,2250,2265,2275,2285), periodic=False)
data = pyemma.coordinates.load(files, features=feat)
labels = ['atom\ndistances']
lags = [1, 2, 5, 10, 20]
dims =  [i + 1 for i in range(10)]

fig, ax = plt.subplots()
for i, lag in enumerate(lags):
    scores_ = np.array([score_cv(data, dim, lag)for dim in dims])
    scores = np.mean(scores_, axis=1)
    errors = np.std(scores_, axis=1, ddof=1)
    color = 'C{}'.format(i)
    ax.fill_between(dim, scores - errors, scores + errors, alpha=0.3, facecolor=color)
    ax.plot(dim, scores, '--o', color=color, label='lag={:.4f}ns'.format(lag * 0.4))
ax.legend()
ax.set_xlabel('number of dimensions')
ax.set_ylabel('VAMP2 score')
fig.tight_layout()

anything else you need to know?
clonker commented 2 years ago

Thanks! Can you please post the output of feat.describe()?

clonker commented 2 years ago

For scoring you can also try the following and see if it works, but I suspect the problem is elsewhere:

from deeptime.decomposition import VAMP, vamp_score_cv

estimator = VAMP(lagtime=lag, dim=dim)
scores = vamp_score_cv(estimator, data, 
    lagtime=1000  # note that this is a different lagtime: it is used to split the trajectory into blocks of length "lagtime" or longer
)
Bazzinga18 commented 2 years ago

Thanks! Can you please post the output of feat.describe()?

feat.describe()[:10] ['DIST: ASP 120 O 1969 0 - ARG 124 N 2005 0', 'DIST: ASP 120 O 1969 0 - GLU 125 N 2029 0', 'DIST: ASP 120 O 1969 0 - GLY 126 N 2044 0', 'DIST: ASP 120 O 1969 0 - VAL 127 N 2051 0', 'DIST: ASP 120 O 1969 0 - MET 128 N 2067 0', 'DIST: ASP 120 O 1969 0 - LEU 129 N 2084 0', 'DIST: ASP 120 O 1969 0 - ILE 130 N 2103 0', 'DIST: ASP 120 O 1969 0 - LYS 131 N 2122 0', 'DIST: ASP 120 O 1969 0 - LYS 132 N 2144 0', 'DIST: ASP 120 O 1969 0 - THR 133 N 2166 0']

feat.dimension() 361

clonker commented 2 years ago

Alright that is good :slightly_smiling_face: can you please check the contents of data? It should be a list of length "number of files" with each list element a numpy array of shape (n_frames, 361).

Bazzinga18 commented 2 years ago

type of data: <class 'numpy.ndarray'> lengths: 25420 shape of elements: (361,) n_atoms: 2399

clonker commented 2 years ago

alright so this means that you don't have several trajectory flies but just one, is that right? in that case the function you are using is not appropriate to compute a cross-validated vamp score. please use the one I suggested above:

from deeptime.decomposition import VAMP, vamp_score_cv

estimator = VAMP(lagtime=lag, dim=dim)
scores = vamp_score_cv(estimator, data, 
    lagtime=1000  # note that this is a different lagtime: it is used to split the trajectory into blocks of length "lagtime" or longer
)
Bazzinga18 commented 2 years ago

yes there is only one trajectory file. okay thanks i tried to do that. :)

clonker commented 2 years ago

so... did it work? :wink:

clonker commented 2 years ago

Please feel free to reopen if this is still an issue