markovmodel / PyEMMA

🚂 Python API for Emma's Markov Model Algorithms 🚂
http://pyemma.org
GNU Lesser General Public License v3.0
307 stars 118 forks source link

Modifying `add_minrmsd_to_ref` to return the RMSD value corresponding for the frame of the reference trajectory that gave the lowest RMSD #1442

Closed eric-jm-lang closed 4 years ago

eric-jm-lang commented 4 years ago

Hello,

I am using the add_minrmsd_to_ref featurizer for a project but when using a trajectory as reference, instead of adding the RMSD with respect to the first frame of the reference trajectory to the features, I would like instead to have the RMSD calculated with respect to each frame of the reference trajectory, and that it adds as a feature the RMSD for the frame of the reference trajectory that gave the lowest RMSD.

This is because I am looking at a multimeric protein experiencing a change in symmetry during the MD simulations. Imagine for example something like shown bellow: the hexamer on the left evolves into a dimer of trimer on the right.

    B                      
A       C                B   C          A  B
                 --->    A   D    or    F  C  ...
F       D                F   E          E  D
    E

Now for the dimer of trimer, there are 6 arrangements of the chains possible (6 ways for the 6 chains to be separated into 2 trimers depending on where the dimer interface is.

Sadly, after many tests, the way the RMSD is calculated seems to be dependant on the residue or chain number and although for my purpose all 6 arrangements are equivalent, if I fit the 6 arrangements and calculate the RMSD with respect to the first possible arrangement, I won't get a RMSD of 0 for the 5 remaining arrangements because chain A of one arrangement is aligned onto chain A of an other, etc.

One way to overcome this is to put all three possible dimers of trimers into a reference trajectory and to add as a feature the RMSD value with respect to the reference frame that gives the lowest RMSD value of all 6 possible arrangement.

The problem I don't know how to do that, for example how can I define a custom feature that does this? Or is it possible to easily modify add_minrmsd_to_ref so that the RMSD is not calculated for only one frame of a reference trajectory

Note that I could define a feature using add_minrmsd_to_ref for all 6 possible arrangements, but If I do that I end up with 6 dimensions and I only want one for this particular rmsd feature.

I hope it makes sense and many thanks in advance for your help!

thempel commented 4 years ago

You can write a custom function that processes an mdtraj.Trajectory object. Something like this might work, in the end you just need the minimum over the 6 dimensions that you are describing. No warranty though, I have not tested this and, additionally, it's probably inefficient:

ref = mdtraj.load('all-three-possible-dimers-of-trimers-reference.xxx')

def min_min_rmsd(traj):
    return np.array([mdtraj.rmsd(traj, ref, frame=n) for n in range(ref.n_frames)]).min(axis=0)[:, np.newaxis]

feat = pyemma.coordinates.featurizer('top_file.pdb')
feat.add_custom_func(min_min_rmsd, 1)

Maybe this is also interesting in this context.

Let me know if that's what you were looking for.

eric-jm-lang commented 4 years ago

This is exactly what I was looking for, thank you so much @thempel ! add_custom_func is indeed quite handy to use and very useful to add new features. Thanks a lot for your help!

thempel commented 4 years ago

You're welcome :)