markovmodel / pyemma_tutorials

How to analyze molecular dynamics data with PyEMMA
Creative Commons Attribution 4.0 International
71 stars 34 forks source link

save_trajs for multiple trajectories #198

Closed orthonalmatrix closed 2 years ago

orthonalmatrix commented 2 years ago

Hello, I have built and validated my hmm and would like to output sample conformations. I'm following the example in tutorial 7. I'm having errors when I specify that some trajectories should be treated as one in the coordinates.souce() function. Here is what works:

data_source = pyemma.coordinates.source(['trajs/linkedtraj1.dcd','trajs/linkedtraj2.dcd'], features=None, top='trajs/full_top.pdb', stride=10)
pyemma.coordinates.save_trajs(
    data_source,
    np.array([ [0, 22],[0,146],[0,191],[0, 18]])
    )

However, I want some of my trajectories to be joined so I put them as nested lists as specified here. Nested lists (1 level) like), eg.: [[‘traj1_0.xtc’, ‘traj1_1.xtc’], ‘traj2_full.xtc’], [‘traj3_0.xtc, …]]

But when I do that:

data_source = pyemma.coordinates.source([['trajs/linkedtraj1.dcd','trajs/linkedtraj2.dcd']], features=None, top='trajs/full_top.pdb', stride=10)
pyemma.coordinates.save_trajs(
    data_source,
    np.array([ [0, 22],[0,146],[0,191],[0, 18]])
    )`
I get:
`---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-174-2a0f5e2e2b19> in <module>
      3 pyemma.coordinates.save_trajs(
      4     data_source,
----> 5     np.array([ [0, 22],[0,146],[0,191],[0, 18]])
      6     #,outfiles=['hmm_{}.pdb'.format(n + 1) for n in range(hmms[hmnum].nstates)]
      7     )

~/bin/anaconda3/envs/pyemma/lib/python3.7/site-packages/pyemma/coordinates/api.py in save_trajs(traj_inp, indexes, prefix, fmt, outfiles, inmemory, stride, verbose)
    842         import os
    843 
--> 844         _, fmt = os.path.splitext(traj_inp.filenames[0])
    845     else:
    846         fmt = '.' + fmt

~/bin/anaconda3/envs/pyemma/lib/python3.7/posixpath.py in splitext(p)
    120 
    121 def splitext(p):
--> 122     p = os.fspath(p)
    123     if isinstance(p, bytes):
    124         sep = b'/'

TypeError: expected str, bytes or os.PathLike object, not list

Any idea what I am doing incorrectly? Thanks!

marscher commented 2 years ago

It could be unsupported by the save_trajs function, as it expects the first element of the list to be a filename, not a nested element. ping @clonker

clonker commented 2 years ago

Yup pretty much what @marscher said. When dealing with fragmented trajectories the filenames attached to the source object (call data_source.filenames) reflect the hierarchical structure. This is unsupported by save_trajs. On the other hand the filenames are really just used to obtain the output format (defaulting to input format). So one workaround is setting the fmt='dcd' explicitly, i.e.,

pyemma.coordinates.save_trajs(data_source, np.array([ [0, 22],[0,146],[0,191],[0, 18]]), fmt='dcd')
orthonalmatrix commented 2 years ago

Thanks! This worked. So is it a problem with the documentation listing: Nested lists (1 level) like), eg.: [[‘traj1_0.xtc’, ‘traj1_1.xtc’], ‘traj2_full.xtc’], [‘traj3_0.xtc, …]]?

clonker commented 2 years ago

The nesting is fine, it is more an incompatibility to how nested lists of trajectories are treated and the implementation of save_trajs. In the linked PR I have proposed a fix so that specifying the fmt explicitly is no longer necessary.

orthonalmatrix commented 2 years ago

Ok. Thanks!