markovmodel / PyEMMA

🚂 Python API for Emma's Markov Model Algorithms 🚂
http://pyemma.org
GNU Lesser General Public License v3.0
311 stars 119 forks source link

Openning XTC files using pyemma.coordinates.load(files, features=feat) #1495

Closed andresilvapimentel closed 3 years ago

andresilvapimentel commented 3 years ago

Hi all, I am trying to load a xtc file using the mdtraj.load And, the coordinates and features are being loaded using the pyemma.coordinates.load(files, features=feat) or pyemma.coordinates.load(files, features=torsions_feat). The pdb file has unit cells, but I removed unitcells later. However, the trajectory file also has unit cells, but I was unable to remove the unitcells. I think this is the issue about giving an error message below. Do you know how to fix this issue?

The error message is: ValueError Traceback (most recent call last)

in () ----> 1 data = pyemma.coordinates.load(files, features=feat) 2 print('type of data:', type(data)) 3 print('lengths:', len(data)) 4 print('shape of elements:', data[0].shape) 5 print('n_atoms:', feat.topology.n_atoms) /usr/local/lib/python3.7/dist-packages/pyemma/coordinates/api.py in load(trajfiles, features, top, stride, chunksize, **kw) 248 return trajs 249 else: --> 250 raise ValueError('unsupported type (%s) of input' % type(trajfiles)) 251 252 ValueError: unsupported type () of input
clonker commented 3 years ago

Hi, this is because your input to load is already an MDTraj trajectory. Can you try giving it the filename(s) instead?

andresilvapimentel commented 3 years ago

Thank you I did not understand what you mean. Could you explain in a different way, please?

clonker commented 3 years ago

How exactly does your call look like? You might want to try using it like this:

pyemma.coordinates.load(['file1.xtc', 'file2.xtc'], featurizer=torsions_feat)
andresilvapimentel commented 3 years ago

I also tried like this: trajs = ['traj01.xtc', 'traj02.xtc'] files = load(trajs, top='my_structure.pdb')

But it did not work out... Any other suggestion?

clonker commented 3 years ago

Can you post the stack trace (full error message) of

trajs = ['traj01.xtc', 'traj02.xtc']
files = load(trajs, top='my_structure.pdb')

? this should work.

andresilvapimentel commented 3 years ago

ValueError Traceback (most recent call last)

in () 1 torsions_feat = pyemma.coordinates.featurizer('spa_no_ca_back_none.pdb') 2 torsions_feat.add_backbone_torsions(cossin=True, periodic=False) ----> 3 torsions_data = pyemma.coordinates.load(files, features=torsions_feat) 4 labels = ['backbone\ntorsions'] 5 /usr/local/lib/python3.7/dist-packages/pyemma/coordinates/api.py in load(trajfiles, features, top, stride, chunksize, **kw) 248 return trajs 249 else: --> 250 raise ValueError('unsupported type (%s) of input' % type(trajfiles)) 251 252 ValueError: unsupported type () of input
clonker commented 3 years ago

What is the output if you print(files) before the call to load?

andresilvapimentel commented 3 years ago

['spa_no_ca_back_none.pdb'] [array([[5.032 , 5.6740003, 2.4680002, ..., 5.4080005, 4.2640004, 4.8320003], [4.9820004, 5.5360003, 2.3100002, ..., 5.51 , 4.2200003, 4.86 ], [5.322 , 5.6520004, 2.41 , ..., 5.6340003, 4.124 , 4.9620004], ..., [9.51 , 1.95 , 5.458 , ..., 6.596 , 3.6940002, 5.51 ], [9.412001 , 1.8780001, 6.044 , ..., 6.7860003, 3.6640003, 5.55 ], [9.616 , 1.8340001, 6.5160003, ..., 6.8960004, 3.3400002, 5.6580005]], dtype=float32), array([[5.032 , 5.6740003, 2.4680002, ..., 5.4080005, 4.2640004, 4.8320003], [4.9820004, 5.5360003, 2.3100002, ..., 5.51 , 4.2200003, 4.86 ], [5.322 , 5.6520004, 2.41 , ..., 5.6340003, 4.124 , 4.9620004], ..., [9.51 , 1.95 , 5.458 , ..., 6.596 , 3.6940002, 5.51 ], [9.412001 , 1.8780001, 6.044 , ..., 6.7860003, 3.6640003, 5.55 ], [9.616 , 1.8340001, 6.5160003, ..., 6.8960004, 3.3400002, 5.6580005]], dtype=float32)]

clonker commented 3 years ago

So it seems that files are not quite the files you want to load but rather the pdb and a bunch of arrays, right? Do you know how to proceed from here? Otherwise, please let me know how you create the files object.

andresilvapimentel commented 3 years ago

Thanks. This is what I was thinking since the begining... I do not know how to proceed from here to create the right files object. Could you helpe me, please?

clonker commented 3 years ago

Well somewhere on your machine you'll have XTC files, just create a Python list of them and put it into the load function like so:

import pyemma

my_files = ['path/to/file1.xtc', 'path/to/file2.xtc']  # you can use more (or fewer) than two files of course
featurizer = pyemma.coordinates.featurizer('spa_no_ca_back_none.pdb')
torsions_feat.add_backbone_torsions(cossin=True, periodic=False)

data = pyemma.coordinates.load(my_files, featurizer=featurizer)

This is really all you need to do, afterwards data contains the loaded and featurized trajectories which you can use for further analysis.

andresilvapimentel commented 3 years ago

I am uploading the pdb and xtc files into the notebook using: from google.colab import files uploaded = files.upload()

I loaded the data input as you suggested: pdb = ['path/to/spa_no_ca_back_none.pdb'] files = ['path/to/spa_back_none.xtc']

I printed the files: print(pdb) print(files) and I got: ['path/to/spa_no_ca_back_none.pdb'] ['path/to/spa_back_none.xtc']

Then, I featurized as you suggested: featurizer = pyemma.coordinates.featurizer('spa_no_ca_back_none.pdb') data = pyemma.coordinates.load(files, featurizer=featurizer)

and got the error message: TypeError Traceback (most recent call last)

in () 1 featurizer = pyemma.coordinates.featurizer('spa_no_ca_back_none.pdb') ----> 2 data = pyemma.coordinates.load(files, featurizer=featurizer) /usr/local/lib/python3.7/dist-packages/pyemma/coordinates/api.py in load(trajfiles, features, top, stride, chunksize, **kw) 241 and (any(isinstance(item, (list, tuple, str)) for item in trajfiles) 242 or len(trajfiles) == 0)): --> 243 reader = create_file_reader(trajfiles, top, features, chunksize=cs, **kw) 244 trajs = reader.get_output(stride=stride) 245 if len(trajs) == 1: TypeError: create_file_reader() got multiple values for argument 'featurizer' I also tried to featurize in a different way: torsions_feat = pyemma.coordinates.featurizer('spa_no_ca_back_none.pdb') torsions_feat.add_backbone_torsions(cossin=True, periodic=False) torsions_data = pyemma.coordinates.load(files, features=torsions_feat) labels = ['backbone\ntorsions'] And I also got the error message: ValueError Traceback (most recent call last) in () 1 torsions_feat = pyemma.coordinates.featurizer('spa_no_ca_back_none.pdb') 2 torsions_feat.add_backbone_torsions(cossin=True, periodic=False) ----> 3 torsions_data = pyemma.coordinates.load(files, features=torsions_feat) 4 labels = ['backbone\ntorsions'] 5 1 frames /usr/local/lib/python3.7/dist-packages/pyemma/coordinates/data/util/reader_utils.py in create_file_reader(input_files, topology, featurizer, chunksize, **kw) 87 if not all_exist: 88 raise ValueError('Some of the given input files were directories' ---> 89 ' or did not exist:\n%s' % err_msg.getvalue()) 90 featurizer_or_top_provided = featurizer is not None or topology is not None 91 # we need to check for h5 first, because of mdtraj custom HDF5 traj format (which is deprecated). ValueError: Some of the given input files were directories or did not exist: File path/to/spa_back_none.xtc did not exist or was no file Please, let me know what changes I need to include for correcting the errors. Thank you for helping me.
clonker commented 3 years ago
import pyemma
from google.colab import files

uploaded = files.upload()
print("Uploaded files:", list(uploaded.keys()))

files = ["spa_back_none.xtc"]
featurizer = pyemma.coordinates.featurizer('spa_no_ca_back_none.pdb')
featurizer.add_backbone_torsions(cossin=True, periodic=False)
torsions_data = pyemma.coordinates.load(files, features=featurizer)

This should do the trick :) If it doesn't, please paste the error message(s) here.

andresilvapimentel commented 3 years ago

I did exactly what you suggested and I got the following error message: ValueError Traceback (most recent call last)

in () 1 featurizer = pyemma.coordinates.featurizer('spa_no_ca_back_none.pdb') 2 featurizer.add_backbone_torsions(cossin=True, periodic=False) ----> 3 torsions_data = pyemma.coordinates.load(files, features=featurizer) 1 frames /usr/local/lib/python3.7/dist-packages/pyemma/coordinates/data/util/reader_utils.py in create_file_reader(input_files, topology, featurizer, chunksize, **kw) 87 if not all_exist: 88 raise ValueError('Some of the given input files were directories' ---> 89 ' or did not exist:\n%s' % err_msg.getvalue()) 90 featurizer_or_top_provided = featurizer is not None or topology is not None 91 # we need to check for h5 first, because of mdtraj custom HDF5 traj format (which is deprecated). ValueError: Some of the given input files were directories or did not exist: File path/to/spa_back_none.xtc did not exist or was no file
clonker commented 3 years ago

Hi, you still got path/to/spa_back_none_1.xtc instead of just spa_back_none_1.xtc, could you change that?

andresilvapimentel commented 3 years ago

Yes. It worked it out. However, I needed to change the foolowing command:

torsions_feat = pyemma.coordinates.featurizer(pdb) to torsions_feat = pyemma.coordinates.featurizer('spa_no_ca_back_none.pdb')

positions_feat = pyemma.coordinates.featurizer(pdb) to positions_feat = pyemma.coordinates.featurizer('spa_no_ca_back_none.pdb')

distances_feat = pyemma.coordinates.featurizer(pdb) to distances_feat = pyemma.coordinates.featurizer('spa_no_ca_back_none.pdb')

andresilvapimentel commented 3 years ago

Thanks!!! I will follow the tutorial now... but it seems it is going to work everything else.

clonker commented 3 years ago

Great! :slightly_smiling_face: Closing this issue then, please open a new one if something doesn't work.