Closed ghost closed 4 years ago
Hi Tia, the custom feature feature expects a function as argument into which MDTraj trajectory objects (frames) are passed. In that spirit can you try something like this?
ermsd = coor.data.CustomFeature(lambda traj: bb.ermsd_traj(ref, traj))
Not sure what the dim=1647
is doing there, though.
Cheers, Moritz
Hello,
I tried the suggestion with a few additions. Here is the code:
`import pyemma import glob import pyemma.coordinates as coor import numpy as np import barnaba as bb
ref = 'asl-reference.pdb' top = 'asl-g34.prmtop' trajs = glob.glob("asl-g34-*.nc")
print (trajs)
reader = coor.source(trajs, top=top)
ermsd = coor.data.CustomFeature(lambda traj: bb.ermsd(ref, trajs, topology=top), dim=1647) reader.featurizer.add_custom_feature(ermsd) data = reader.get_output()
tica = coor.tica(reader) ` It seems to work fine, as the trajectories are loaded in barnaba. However, I now get this error:
TypeError: Cannot cast array from dtype('float64') to dtype('float32') according to the rule 'safe'
PS: I added dim=1647 to the command because otherwise, I get this error
TypeError: init() missing 1 required positional argument: 'dim'
To calculate dimensions, I simply multiplied the number of atoms in my system by 3.
Thank you for your suggestions,
Tia
Hi, could you post the full stack trace? Thanks!
I'm not quite sure what that is, but here is a picture of my screen.
Oh you showed exactly the right thing, what I meant with stack trace is what is written under Traceback
in your screenshot. Can you try the following:
ermsd = coor.data.CustomFeature(lambda traj: bb.ermsd(ref, trajs, topology=top).astype(np.float32), dim=1647)
And regarding the dim
: I am not sure that a dim=1647
is the right thing here. What you want to put there is the output dimension of the ermsd
computation. I assume that it is a scalar quantity, i.e., 1
? So when you compute bb.ermsd(ref, trajs, topology=top)
I would expect a resulting shape
of (length-of-trajectory, 1)
or equivalent.
Argh and I just realized that the lambda definition is not correct like this. It should be
ermsd = coor.data.CustomFeature(lambda traj: bb.ermsd(ref, traj, topology=top).astype(np.float32), dim=1647)
(note the traj
rather than trajs
in the ermsd
call)
So i fixed something up for you using the sample data of BARNABA:
import pyemma
import numpy as np
import barnaba as bb
import pyemma.coordinates as coor
import mdtraj
top = 'sample1.pdb'
traj = 'samples.xtc'
reader = coor.source([traj], top=top)
ref = mdtraj.load_frame(traj, 0, top=top) # first frame of traj
def ermsd_feature(traj):
result = bb.ermsd_traj(ref, traj)
return result.astype(np.float32).reshape(-1, 1)
ermsd = coor.data.CustomFeature(ermsd_feature, dim=1)
reader.featurizer.add_custom_feature(ermsd)
data = reader.get_output()
This should run through without any errors. BUT since it seems like you want to apply TICA afterwards I am guessing you want to find out something about the kinetics of your trajectory. For this matter eRMSD is probably not enough as a feature, consider adding more. If not you will end up with a one-dimensional trajectory.
This works if I only load 1 trajectory, but I have 30 of them. I have tried using glob to load them all, but the ermsd calculation fails, with this error:
The same thing happens when I try to load them as a list. How should I proceed?
That is probably because of how you pass the trajectories. My guess is that you pass a list of lists somehow, in which case pyemma will use a FragmentedTrajectoryReader
. You will want to pass a list of file names instead.
So if you just took the code i posted above, make sure to use something like
trajs = glob.glob(...)
reader = coor.source(trajs, top=top)
and not coor.source([trajs], top=top)
.
Thank you so much :)
I have tried adding other features but it fails.
That is because it didn't find any phi/psi backbone torsions in the PDB. You can pass a custom selection string to use a specific set of backbone angles that is not phi/psi. You can also try to instantiate a featurizer directly, as described here although it probably does (and should) not make a difference.
Hi Tia, you wrote you are working with RNA systems. Pyemma's featurizer is generally designed for protein systems, i.e. method names refer to protein definitions of e.g. what are backbone atoms and how to compute the torsion angles between them. So calling featurizer.add_backbone_torsions()
won't work because it doesn't find atoms that fit this definition ('empty indices'). I think that going via custom feature functions as you did before is a good way to implement feature functions in this case.
I will try that, thanks
Hello, According to the previous comment, I decided to add another custom feature from barnaba, namely g-vectors. Unlike eRMSD that yields a 1-D array, the g-vector calculation yields a 4-D array with dimension (m,n,n,4), where m is the number of frames loaded, and n is the number of nucleotides and it looks like this:
To insert this custom feature, I want to flatten the array to (m,nn4), so I used this definition for gvec feature:
This is what happens when I load this feature in PyEMMA:
I am not sure what the problem is.
Thank you.
That is because dump_gvec
expects a file, not a trajectory python object. For this, try dump_gvec_traj
.
You can see this by looking at the stack trace, it says `.../barnaba/functions.py, line 168, in dump_gvec", and having a look at that particular file at that particular line reveals that it tries to load() a file instead of extracting g vectors from an object.
Also the reshape should be the other way around, i.e., .reshape(-1, n*n*4)
; this is because PyEMMA might load the trajectory chunk-wise but the latter dimension will always stay the same (just the length might change). Or even: traj.reshape(traj.xyz.shape[0], -1)
.
This is the error I get when I try the first suggestion:
I'm not sure how to implement the second suggestion. This is how I did it:
Even though this is not really a pyemma issue but rather a barnaba thing: dump_gvec_traj
returns a tuple of two where the first element is matrix you are after and the second element is the list of residues. To make it work you have to first extract the matrix like result = bb.dump_gvec_traj(traj)[0]
.
The thing to remember with custom features is really that the input is a MDTraj trajectory object and the output is expected to be a NumPy array of shape (trajectory_length, feature_dimensions)
and dtype np.float32
. Beyond that you are free to do whatever with the data you got at hand and it should work with the pyemma pipeline.
The second suggestion is just how to do the reshape differently, both implementations should do the same in principle. Anyhow the first implementation looks cleaner and also correct (up to extraction of the tuple element and an I think unneccessary np.asarray(...)
) to me.
Thank you
Hello,
I'm new to PyEMMA and I am trying to build an eRMSD MSM for an RNA system. (eRMSD is calculated using BARNABA - python package). This is the code I'm trying to use to define my custom feature:
`import pyemma import pyemma.coordinates as coor import glob import matplotlib.pyplot as plt import barnaba as bb import numpy as np
ref = 'reference.pdb' traj_list = glob.glob("trajectory-*.nc") top = 'topology.prmtop'
print (traj_list)
create reader
reader = coor.source(traj_list, top=top)
add feature ***
ermsd = coor.data.CustomFeature((bb.ermsd(ref,f,topology=top)[0] for f in traj_list), dim=1647) reader.featurizer.add_custom_feature(ermsd)
perform tica on feature space
tica = coor.tica(reader)`
It all seems to work fine until I try to perform tica. When I do, I get the error "TypeError: 'generator' object is not callable".
I think it might be the way I'm defining my custom feature, but I don't know how to fix it.
Your help is greatly apprecieated,
Tia