Slow performance of atom selection with MDAnalysis

We are using the opencadd.structure.superposition module in TeachOpenCADD talktorial 010 and observed that selecting atoms with MDAnalysis (used in that module) is slow.

See PR: https://github.com/volkamerlab/TeachOpenCADD/pull/44

cProfile / snakeviz

Profiled code

import pandas as pd

from MDAnalysis.analysis import rms

from opencadd.structure.core import Structure
from opencadd.structure.superposition.engines.mda import MDAnalysisAligner

def calc_rmsd(A, B):
    """
    Calculate RMSD between two structures.

    Parameters
    ----------
    A : opencadd.structure.core.Structure
        Structure A.
    B : opencadd.structure.core.Structure
        Structure B.

    Returns
    -------
    float
        RMSD value.
    """
    aligner = MDAnalysisAligner()
    selection, _ = aligner.matching_selection(A, B)
    A = A.select_atoms(selection['reference'])
    B = B.select_atoms(selection['mobile'])
    return rms.rmsd(A.positions, B.positions, superposition=False)

structures = [Structure.from_pdbid(pdb_id) for pdb_id in ["3w2s", "3poz"]]
proteins = [Structure.from_atomgroup(s.select_atoms("protein")) for s in structures]
calc_rmsd(proteins[0], proteins[1])

Profile

In MDAnalysis.core.selection, the fnmatch package is used to look up the atoms (for atoms selection). Find out if we can cache the atom selection for superposition to be a fit faster.

volkamerlab / opencadd

Slow performance of atom selection with MDAnalysis #47

cProfile / snakeviz

Profiled code

Profile