Closed cgseitz closed 2 years ago
Hi, could you send us a small test script and/or the full stacktrace? Thanks!
Sure, here is the full stacktrace. Let me know if you need anything else~
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-82-58a881b0ce7b> in <module>
1 #lag time here doesn't really matter
2 print("2009 glycosylated HA")
----> 3 data_2009_glycosylated = pyemma.coordinates.load(trajs1, features=feat_breathing_2009g)
4 print('trajectory time step = .06 ns')
5 print("data_2009_glycosylated has",len(data_2009_glycosylated), "trajectories")
~/miniconda3/envs/pyemma/lib/python3.9/site-packages/pyemma/coordinates/api.py in load(trajfiles, features, top, stride, chunksize, **kw)
242 or len(trajfiles) is 0)):
243 reader = create_file_reader(trajfiles, top, features, chunksize=cs, **kw)
--> 244 trajs = reader.get_output(stride=stride)
245 if len(trajs) == 1:
246 return trajs[0]
~/miniconda3/envs/pyemma/lib/python3.9/site-packages/pyemma/coordinates/data/_base/datasource.py in get_output(self, dimensions, stride, skip, chunk)
404 pg.register(it.n_chunks, description='getting output of %s' % self.__class__.__name__)
405 with pg.context(), it:
--> 406 for itraj, chunk in it:
407 i = slice(it.pos, it.pos + len(chunk))
408 assert i.stop - i.start > 0
~/miniconda3/envs/pyemma/lib/python3.9/site-packages/pyemma/coordinates/data/_base/datasource.py in __next__(self)
1047 self._select_file(self._itraj)
1048 try:
-> 1049 X = self._use_cols(self._next_chunk())
1050 self._t += len(X)
1051 except StopIteration as e:
~/miniconda3/envs/pyemma/lib/python3.9/site-packages/pyemma/coordinates/data/_base/datasource.py in _next_chunk(self)
1169 # We discard the trajectory index here for transformation
1170 if self.transform_function is not None:
-> 1171 x = self.transform_function(x)
1172 return x
1173
~/miniconda3/envs/pyemma/lib/python3.9/site-packages/pyemma/coordinates/data/feature_reader.py in transform(data)
153 return data
154 else:
--> 155 return self.featurizer.transform(data)
156
157 it = FeatureReaderIterator(self, skip=skip, chunk=chunk, stride=stride, return_trajindex=return_trajindex,
~/miniconda3/envs/pyemma/lib/python3.9/site-packages/pyemma/coordinates/data/featurization/featurizer.py in transform(self, traj)
910 vec.shape[0]))
911 else:
--> 912 vec = f.transform(traj).astype(np.float32)
913 feature_vec.append(vec)
914
~/miniconda3/envs/pyemma/lib/python3.9/site-packages/pyemma/coordinates/data/featurization/misc.py in transform(self, traj)
308 traj_copy = traj_copy.image_molecules()
309 for aas, mms in zip(self.group_definitions, self.masses_in_groups):
--> 310 COM_xyz.append(np.average(traj_copy.xyz[:, aas, ], axis=1, weights=mms))
311 return np.hstack(COM_xyz)
312
<__array_function__ internals> in average(*args, **kwargs)
~/miniconda3/envs/pyemma/lib/python3.9/site-packages/numpy/lib/function_base.py in average(a, axis, weights, returned)
395 "differ.")
396 if wgt.ndim != 1:
--> 397 raise TypeError(
398 "1D weights expected when shapes of a and weights differ.")
399 if wgt.shape[0] != a.shape[axis]:
TypeError: 1D weights expected when shapes of a and weights differ.
One more question, if I only wanted the center of masses of the alpha carbons of the selections, how would I incorporate that into a custom feature?
Having a closer look at your code, I think the problem is here:
COM_RBD1 = GroupCOMFeature(feat_breathing_2009g.topology, [[list(range(66,286))]])
It creates group definitions as a triply-nested list [[[66, 67, 68, ...]]]
while the featurizer expects a list of groups, i.e., [[66, 67, 68, ...]]
. I'm not sure I understand your second question properly, the GroupCOM feature already gives you the center of mass if you give it the indices of C alphas.
Hello,
When I change the code as you suggest
feat_breathing_2009g = pyemma.coordinates.featurizer(top1)
from pyemma.coordinates.data.featurization.misc import GroupCOMFeature
COM_RBD1 = GroupCOMFeature(feat_breathing_2009g.topology, [list(range(66,286))])
COM_central_helices_top = GroupCOMFeature(feat_breathing_2009g.topology, [[419, 985, 1551]])
def feature_1(traj: md.Trajectory):
centers1 = COM_RBD1.transform(traj) # yields ndarray
centers2 = COM_central_helices_top.transform(traj) # yields ndarray
xyz = np.hstack((centers1, centers2))
traj = md.Trajectory(xyz.reshape(-1, 3, 3), topology=None)
# this has shape (n_frames, 1)
distance1 = np.linalg.norm(np.subtract(centers1,centers2)) #does not return a numpy array
return distance1
feat_breathing_2009g.add_custom_func(feature_1, dim=1, description='breathing')
data_2009_glycosylated = pyemma.coordinates.load(trajs1, features=feat_breathing_2009g)
The matrix does not reshape properly, into something that is appropriately divisible:
ValueError Traceback (most recent call last)
<ipython-input-14-58a881b0ce7b> in <module>
1 #lag time here doesn't really matter
2 print("2009 glycosylated HA")
----> 3 data_2009_glycosylated = pyemma.coordinates.load(trajs1, features=feat_breathing_2009g)
4 print('trajectory time step = .06 ns')
5 print("data_2009_glycosylated has",len(data_2009_glycosylated), "trajectories")
~/miniconda3/envs/pyemma/lib/python3.9/site-packages/pyemma/coordinates/api.py in load(trajfiles, features, top, stride, chunksize, **kw)
242 or len(trajfiles) is 0)):
243 reader = create_file_reader(trajfiles, top, features, chunksize=cs, **kw)
--> 244 trajs = reader.get_output(stride=stride)
245 if len(trajs) == 1:
246 return trajs[0]
~/miniconda3/envs/pyemma/lib/python3.9/site-packages/pyemma/coordinates/data/_base/datasource.py in get_output(self, dimensions, stride, skip, chunk)
404 pg.register(it.n_chunks, description='getting output of %s' % self.__class__.__name__)
405 with pg.context(), it:
--> 406 for itraj, chunk in it:
407 i = slice(it.pos, it.pos + len(chunk))
408 assert i.stop - i.start > 0
~/miniconda3/envs/pyemma/lib/python3.9/site-packages/pyemma/coordinates/data/_base/datasource.py in __next__(self)
1047 self._select_file(self._itraj)
1048 try:
-> 1049 X = self._use_cols(self._next_chunk())
1050 self._t += len(X)
1051 except StopIteration as e:
~/miniconda3/envs/pyemma/lib/python3.9/site-packages/pyemma/coordinates/data/_base/datasource.py in _next_chunk(self)
1169 # We discard the trajectory index here for transformation
1170 if self.transform_function is not None:
-> 1171 x = self.transform_function(x)
1172 return x
1173
~/miniconda3/envs/pyemma/lib/python3.9/site-packages/pyemma/coordinates/data/feature_reader.py in transform(data)
153 return data
154 else:
--> 155 return self.featurizer.transform(data)
156
157 it = FeatureReaderIterator(self, skip=skip, chunk=chunk, stride=stride, return_trajindex=return_trajindex,
~/miniconda3/envs/pyemma/lib/python3.9/site-packages/pyemma/coordinates/data/featurization/featurizer.py in transform(self, traj)
890 if isinstance(f, CustomFeature):
891 # NOTE: casting=safe raises in numpy>=1.9
--> 892 vec = f.transform(traj).astype(np.float32, casting='safe')
893 if vec.shape[0] == 0:
894 vec = np.empty((0, f.dimension))
~/miniconda3/envs/pyemma/lib/python3.9/site-packages/pyemma/coordinates/data/featurization/misc.py in transform(self, traj)
113
114 def transform(self, traj):
--> 115 feature = self._fun(traj, *self._args, **self._kwargs)
116 if not isinstance(feature, np.ndarray):
117 raise ValueError("your function should return a NumPy array!")
<ipython-input-13-2695c00a87f4> in feature_1(traj)
13 centers2 = COM_central_helices_top.transform(traj) # yields ndarray
14 xyz = np.hstack((centers1, centers2))
---> 15 traj = md.Trajectory(xyz.reshape(-1, 3, 3), topology=None)
16 # this has shape (n_frames, 1)
17 #distance1 = np.linalg.norm(centers1 - centers2) #does not return a numpy array
ValueError: cannot reshape array of size 44178 into shape (3,3)
I am generating center of mass distances as 1D points; do I need to keep these 3D? For the other part, I guess I am not clear on whether the indices you give to the GroupCOMFeature
will be read as residue indices or atom indices.
Regarding your stacktrace: It pretty much already says what is wrong with your code. You reshape
traj = md.Trajectory(xyz.reshape(-1, 3, 3), topology=None)
meaning that the coordinates are supposed to be of shape (time, 3 atoms, 3 spatial dimensions)
which does not equally divide up the amount of data you got in xyz
. Please check the outputs of your COM computations and reshape accordingly. That the distances are one-dimensional is not a problem although you might want to use some more features than just that, otherwise you are prone to large projection errors.
Also in your implementation you evaluate
distance1 = np.linalg.norm(np.subtract(centers1,centers2)) #does not return a numpy array
which returns a single scalar. This is in all likelihood not what you want, rather you want to evaluate the COM distances in every frame. :slightly_smiling_face: In general I suggest running a debugger for these matters and checking the shape output step by step (or even good old print debugging).
Thanks! With your help I've made progress. I am trying to use the residue COM definition, where I can add the residue indices I need. However, the code syntax is not the same as the documentation:
COM_RBD2 = ResidueCOMFeature(feat_breathing_2009g.topology, residue_indices=np.array([[632,852]]), residue_atoms=np.array([[632,852]]), scheme='all'
TypeError Traceback (most recent call last)
<ipython-input-77-0f7e165c1eaa> in <module>
23 #COM_RBD2 = ResidueCOMFeature(feat_breathing_2009g.topology, [list(range(632,852))], scheme='all')
24 #COM_RBD2 = ResidueCOMFeature(feat_breathing_2009g.topology, [[632, 852]], [[632,842]], scheme='all')
---> 25 COM_RBD2 = ResidueCOMFeature(feat_breathing_2009g.topology, residue_indices=np.array([[632,852]]), residue_atoms=np.array([[632,852]]), scheme='all')
26 COM_RBD3 = ResidueCOMFeature(feat_breathing_2009g.topology, [list(range(1198,1418))])
27 COM_central_helices_top = ResidueCOMFeature(feat_breathing_2009g.topology, [[419, 985, 1551]])
~/miniconda3/envs/pyemma/lib/python3.9/site-packages/pyemma/coordinates/data/featurization/misc.py in __init__(self, topology, residue_indices, residue_atoms, scheme, ref_geom, image_molecules, mass_weighted)
339 for ri in self.residue_indices:
340 for coor in 'xyz':
--> 341 self._describe.append('%s COM-%s (%s)' % (topology.residue(ri), coor, self.scheme))
342 # TODO consider including the ref_geom and image_molecule arsg here?
343
~/miniconda3/envs/pyemma/lib/python3.9/site-packages/mdtraj/core/topology.py in residue(self, index)
808 The `index`-th residue in the topology.
809 """
--> 810 return self._residues[index]
811
812 @property
TypeError: only integer scalar arrays can be converted to a scalar index
What is the difference between residue_indices and residue_atoms, considering both are required arguments here? How do I appropriately call them? It appears only one is requested in the documentation. http://www.emma-project.org/latest/api/generated/pyemma.coordinates.featurizer.html#pyemma.coordinates.data.featurization.featurizer.MDFeaturizer.add_residue_COM
Hi, glad to hear it. The syntax is exactly as in the documentation :wink:
What you are trying to do is creating a new instance of ResidueCOMFeature
directly, which indeed takes atom indices (of the atoms present in the residues). What you are linking though is the add_residue_COM
method of the Featurizer class. Here indeed a ResidueCOMFeature
instance is internally created, the atom indices are fetched automatically though.
At the very top of the documentation website you linked is an example on how to create/use the featurizer.
Thanks for clarifying, it now makes sense how ResidueCOMFeature
is a part of add_residue_COM
. I can see how the ResidueCOMFeature
may not be able to recognize atoms if the atom indices don't correspond to a residue I have also indexed. With this, I have created my features, many thanks! Have you thought about giving more examples for feature creation on the pyemma page, especially for custom features? I would certainly be willing to share the features I've made if it can help the community.
Great! I think examples are always good, I am looking forward to a PR!
Hello,
I am trying to create three features, which I believe need to be created with the
pyemma.coordinates.data.featurization.misc.GroupCOMFeature
(but I could be wrong). I created center of mass a, b, c, and d. I want to measure the distance between group a and d, b and d, and c and d. Here is an example for group a and d:where the commented out
distance1
are things I've tried. Then I load the trajectories with the features:data_2009_glycosylated = pyemma.coordinates.load(trajs1, features=feat_breathing_2009g)
and each time I get the same error:TypeError: 1D weights expected when shapes of a and weights differ.
Clearly, the error is some small syntax. Could you point out what's wrong? Thanks!Best, Christian