We are using the opencadd.structure.superposition module in TeachOpenCADD talktorial 010 and observed that selecting atoms with MDAnalysis (used in that module) is slow.
import pandas as pd
from MDAnalysis.analysis import rms
from opencadd.structure.core import Structure
from opencadd.structure.superposition.engines.mda import MDAnalysisAligner
def calc_rmsd(A, B):
"""
Calculate RMSD between two structures.
Parameters
----------
A : opencadd.structure.core.Structure
Structure A.
B : opencadd.structure.core.Structure
Structure B.
Returns
-------
float
RMSD value.
"""
aligner = MDAnalysisAligner()
selection, _ = aligner.matching_selection(A, B)
A = A.select_atoms(selection['reference'])
B = B.select_atoms(selection['mobile'])
return rms.rmsd(A.positions, B.positions, superposition=False)
structures = [Structure.from_pdbid(pdb_id) for pdb_id in ["3w2s", "3poz"]]
proteins = [Structure.from_atomgroup(s.select_atoms("protein")) for s in structures]
calc_rmsd(proteins[0], proteins[1])
Profile
In MDAnalysis.core.selection, the fnmatch package is used to look up the atoms (for atoms selection). Find out if we can cache the atom selection for superposition to be a fit faster.
We are using the
opencadd.structure.superposition
module in TeachOpenCADD talktorial 010 and observed that selecting atoms with MDAnalysis (used in that module) is slow.See PR: https://github.com/volkamerlab/TeachOpenCADD/pull/44
cProfile / snakeviz
Profiled code
Profile
In
MDAnalysis.core.selection
, thefnmatch
package is used to look up the atoms (for atoms selection). Find out if we can cache the atom selection forsuperposition
to be a fit faster.